INTRODUCTION

Depression is the leading cause of reversible disability among primary care patients, the major precursor to suicide, contributes to the development and severity of chronic illnesses such as heart disease and diabetes, and increases cost among affected patients with comorbid diseases.15 Veterans experience a high burden of depression, with approximately 12 % of Veterans attending the Veterans Health Administration (VA) primary care practices having symptoms of major depression.6

The VA has invested substantially in evidence-based mental health care. A major initiative focused on implementing collaborative care management (CCM) for depression care in primary care was incorporated into the VA Uniform Mental Health Services Handbook (2007)7 mandating primary care/mental health integration nationally. It specifically recognized collocated mental health specialists in primary care and CCM as requirements for all primary care sites with at least 5000 patients. CCM clinical goals are to ensure that primary care patients identified with depression are assessed, treated, followed frequently, and receive self-management support as indicated in national depression guidelines.8 Management of depression in primary care using CCM for improving treatment can significantly reduce depressive symptoms,9 11 lower risk of depression reoccurrence,12 prevent prolonged disability,13 job loss,14 16 negative life events,17 and reduce suicide rates.18

Yet achieving these favorable outcomes in routine clinical practice is challenging, and a barrier has been the lack of valid, reliable measures to evaluate improvement efforts.19 21 For many conditions, performance metrics have been a major force for improved care.22 VA has required yearly screening for depression over the last decade and has used screening rates to monitor performance (as with HbA1c). In 2008, VA introduced additional national performance measures to assess depression follow-up, only to withdraw them 2 years later. These additional measures encountered substantial resistance from primary care practice sites due to difficulties in interpreting site-level results—the level at which CCM improvements must be implemented. Therefore, our goal was to develop and validate prototype electronic measures suitable for evaluating VA’s CCM initiative from 2000 to 2010 at the primary care practice site level. To this end, we developed site-level measures that used only electronic data, followed patients longitudinally through detection and treatment, and reflected care for the full primary care population at a given primary care location.23 , 24 We based the measures on depression guidelines and on those used in prior CCM evaluations.14 We then reviewed the measures and our development methodology with an expert panel of VA and non-VA experts.

Measure development was guided by the Donabedian model of quality that links healthcare structure, process and outcomes.25 Prior work has shown a link with depression process measures derived from administrative data and hospitalization outcomes.26 In terms of reducing or eliminating depression symptoms, though, the lack of timed and electronically documented symptom assessment data means that neither administrative review nor electronic measures can measure depression symptom outcomes. As done in prior studies,27 , 28 our work focuses on actionable guideline-based depression care processes;29 guidelines in turn reflect evidence linking these processes to outcomes.30 The objectives are to 1) describe the development process and challenges to developing prototype measures that capture the longitudinal course of clinical care from detection through treatment; 2) assess measure validity and limitations through an expert panel; and 3) examine the proportion of Veterans in primary care that met those depression quality measures (detection, follow-up and minimal appropriate treatment) at the VA from 2000 to 2010.

METHODS

Design

We used a systematic approach to identify a population, determine inclusion and exclusion criteria for the cohorts, develop quality measures, and map how each patient was accounted for in each measure. Per the domains identified by Hermann,31 we expected our measures to be meaningful, feasible, and actionable for quality improvement. For meaningfulness, we relied on literature, guidelines and an expert panel. For feasibility, we assessed whether we could program measures that accurately accounted for the full target population at each branch node of our measurement algorithm using counting trees. For actionability, we used benchmarks from the literature, interpretability by our expert panel, and applicability to assessing CCM.

Expert Panel

In March 2015, we convened a 1-day modified Delphi expert panel17 , 32 to review our development methods and the results of applying our measures. Panelists included VA and non-VA experts in quality and performance measurement, depression, primary care/mental health integration, and program evaluation. Prior to the meeting, panelists received a detailed report on measure development and results. They completed an on-line survey (available from the authors) to evaluate sampling decisions and definitions, and on the importance and feasibility of electronic depression quality measurement. Survey results were presented to the panelists at the meeting. Summary notes were taken on a flip chart, and panelists were asked to vote on the summaries in real time. Two investigators took detailed notes.

Data

We used existing VA electronic medical record data from the National Patient Care Database and prescription data from Pharmacy Benefits Management Database to pull cohorts of all primary care patients during the federal fiscal years (FY) 2000–2010 from nine Veteran Integrated Service Networks (VISNs). Our larger project focused on the VA implementation of CCM over the decade, so we included four VISNs that implemented CCM early and five additional VISNs from across the United States to represent diverse levels of involvement with mental health primary care models and geographical diversity. The Greater Los Angeles Human Subjects Institutional Review Board approved this study.

Identification of the Population

Identifying the appropriate patient population on which to apply quality measures is a critical component of measure development. In our case, this challenge included identifying patients seen in primary care, determining the index visit, establishing continuity of care, and validating an algorithm to exclude patients with recent prior diagnosis or depression treatment. Issues related to timing of visits, timing of exclusion criteria, and cut-points for prescription medications and refills had to be considered. Figure 1 illustrates the development process.

Figure 1.
figure 1

Identification of a new episode of depression for patients without depression diagnosis or minimal treatment in 6 months prior to index visit: example of how a patient was identified in federal fiscal year (FY) 2005.

Primary Care Cohort and Index Visit

The cohorts of patients for each measure included all patients seen in primary care for each FY, 2000–2010. The baseline visit (defined “index visit”) was a patient’s first primary care visit after the start of a given FY based on primary care visit encounter identifiers (for the VA: “clinic stop codes”).

Continuously Seen Cohort and Home Site

The patient must have been seen at their primary care site at least once within the 12 months prior to the index visit (T0-12) and again in the 12 months after the index visit (T0 + 12). This definition of “continuously seen” allows for sufficient time for follow-up and helps avoid truncated data. To assign each patient a home site, we used an algorithm similar to those used in a variety of primary care studies.33 , 34 The algorithm stipulates that a patient’s “home site” is the site with the most primary care visits for that patient over the 2-year period. For ties, we used the site with the most recent visit, or when sites differed in complexity of services (e.g., a large medical center versus a smaller community based outpatient clinic), we chose the smaller, less complex site.

Exclusion for Depression Diagnosis or Treatment in Prior 6 Months

To limit the measures to patients with a new episode of depression, we excluded patients with a depression diagnosis (based on ICD-9 codes for depression, shown in Appendix 1 available online), or who had received minimally appropriate treatment in the 6 months prior to the index visit (T0-6). Minimally appropriate treatment was defined as ≥60 days of depression prescriptions (list of antidepressant drugs shown in Appendix 2 available online), ≥4 mental health visits (VA clinic stop codes shown in Appendix 3 available online), or ≥3 psychotherapy visits (Current Procedural Terminology (CPT) codes shown in Appendix 4 available online).

Measures

Based on depression care literature, prior quality measures from the VA and the National Committee for Quality Assurance (NCQA), and depression guidelines, we developed four population-based quality metrics for depression care to follow patients electronically over time (see measures in Table 1). Each reported measure uses as a denominator only the subset of patients to whom the measure is applicable. For example, detection is the proportion of patients who had a newly detected episode of depression (numerator) over the eligible population of primary care patients without recent depression diagnosis or minimally appropriate treatment (denominator). The proportion of patients with follow-up and minimally appropriate treatment (numerators) uses the denominator of all patients with a newly detected episode of depression. The ICD-9 codes, medications, stop codes and CPT codes that we used are shown in Appendices 1, 2, 3 and 4 available online.

  • Measure 1: Detection of new episode of depression: Detection of a new episode of depression was defined as a clinic visit with an ICD-9 code for depression or any antidepressant prescription in the 12 month period after the index visit.

  • Measures 2 & 3: Follow-up of patient with new episode of depression: Following NCQA measures, we evaluated follow-up for a new depression diagnosis within 84 days and 180 days. Appropriate follow-up was defined as ≥3 MH visits, or ≥3 psychotherapy visits, or ≥3 primary care visits with a depression ICD-9 diagnosis within 84 or 180 days of the newly detected episode.

  • Measure 4: Minimally appropriate treatment for patients with a new episode of depression: Minimally appropriate treatment was defined as having ≥60 days of antidepressants, or ≥4 MHS visits, or ≥3 psychotherapy visits within 12 months post detection. For prescriptions, we used the cut point of ≥60 days of prescription medications to indicate at least one medication refill. We excluded prescriptions with non-depression indication/keywords written on the dosing instructions, and prescriptions with a subtherapeutic dose (see Appendix 2 available online for details).

Table 1. Quality of Depression Care Measures

Accounting for All Patients in Each Measure

We used a counting hierarchy (counting trees) to document the number of patients retained and excluded at all steps of measure construction. The counting hierarchy ensures that every patient in the original primary care population cohort is accounted for across each branch in the logic, such that at each node, the sum of patients meeting and not meeting each criterion equals the full initial population.

RESULTS

Overview

Measure development focused on using counting trees to continuously verify application of measures to the primary care population, and measure application focused on using the measures to assess depression care from FY2000 to FY2010 in nine VISNs. The expert panel reviewed and critiqued measure development and application.

Counting Trees

Counting trees show that programming accurately accounted for all patients in the population at every branch node. Figure 2 is the counting tree for minimally appropriate treatment for FY2005. The tree begins on the far left with the cohort of all patients who had an index visit in primary care during FY2005 from the nine VISNs (n = 2,011,849). The next branch shows which patients were continuously seen in their primary care site over the 2-year period (n = 1,574,532). Twenty-two percent (n = 437,317) had not been seen continuously and therefore were not eligible for the measure. The next branches exclude patients who had a prior diagnosis of depression (n = 53,221) or had completed minimal treatment for depression in the prior 6 months (n = 201,814). Of the remaining 1,319,497 patients, 94,130 patients had a new episode of depression detected in FY2005 (7 % of the eligible population). Among those patients with a new episode of depression detected, 82 % (n = 77,533) completed minimally appropriate treatment within 12 months of detection.

Figure 2.
figure 2

Counting tree for completion of minimally appropriate treatment 12 months after detection of a new episode of depression in federal fiscal year 2005. Abbreviations in figure: No ICD-9 at T−6 (vs. ICD-9 at T−6) = Exclude patients who had a diagnosis of depression in the 6 months prior to index visit. No Rx, CPT at T−6 (vs. Rx, CPT at T−6) = Exclude patients who completed a minimal course of treatment for depression in the prior 6 months. Attn to Dep (vs. No Attn to Dep) = There was attention (treatment) to depression found. MD Detected (vs. MD Didn’t Detect) = A new episode of depression was detected. Full Tx at Td−12 (vs. No Full Tx at Td−12) = Completed minimally appropriate treatment within 12 months of detection of a new episode of depression.

Counting trees assessed meaningfulness (through face validity of the branches), but can also be used for quality improvement (to assess what happens to those not meeting the measure). For example, in FY2005, the bottom branch identified that 22 % of the patients had not been continuously seen twice in 2 years, 16,909 had an ICD-9 diagnosis for depression in 6 months prior to the index visit, and 13,008 patients had not received minimally appropriate treatment in the prior 6 months. Among the 13,008 patients, 86 % (n = 11,167) had an additional episode of depression detected, and 12 % did not receive minimally appropriate treatment within 12 months of detection.

Measures Over Time in Nine VISNs

Over the decade, there was a substantial increase in the number of patients seen in primary care in the nine VISNs, from 1.19 million in FY2000 to over 2.26 million in FY2010 (Fig. 3). The cohort without a depression diagnosis or active treatment in the prior 6 months ranged from 790,000 to 1.35 million and the rates for detection of new episodes of depression remained stable at 7–8 % over the years (not shown). Follow-up at 84 and 180 days was 37 % and 45 % in FY2000, and increased to 56 % and 63 % by FY2010 (Fig. 4). Minimally appropriate treatment remained relatively stable at 84 % in FY2000, dropped to 82 % in FY2005, and was 83 % in FY2010.

Figure 3
figure 3

Cohorts of VA primary care patients from the nine Veteran integrated service networks in federal fiscal years 2000–2010 (in millions). *Patients without a depression diagnosis or actively treated in 6 months prior to index visit.

Figure 4.
figure 4

Among patients with a new episode of depression detected, percent of patients with follow-up and minimal treatment completion (FY2000–FY2010).

Expert Panel Evaluation of the Measures as Developed and Applied

The expert panel included 14 panelists, four of whom were also on the project team. Analysis of the pre-meeting survey (86 % response rate) data showed a high level of agreement on the appropriateness of most measurement development and cohort construction methods, including the decision to use antidepressants prescribed in primary care as a signal for depression detection, even without an accompanying ICD-9 diagnosis. Panelists also validated the cohort definitions and counting tree based methods. For the definition of new depression, panelists suggested the name “new episode of depression” to indicate that the patient may have been previously diagnosed, but had not been diagnosed or treated for the previous 6 months. For the treatment measure, panelists judged the threshold for treatment completion we used, although based on prior studies,14 to be too low relative to optimal treatment, especially given the severity and complexity of depression among Veterans. Panelists suggested terming the measure “minimally appropriate treatment.” They discussed future potential modifications, including requiring 90 days of antidepressants (expanded from the 60 day requirement) and/or 90 days of continuous antidepressants. They also discussed restricting the 60 days of medications to be within 90 days of detection (as opposed to 12 months). To address continuity of treatment, panelists suggested that in the future, measures should require mental health visits (or psychotherapy visits) to be with the same provider. Panelists also endorsed the future goal of similar depression care measures for evaluating the population of patients screening positive for depression in primary care.

DISCUSSION

The VA saw rapid growth in primary care patients from FY2000 to FY2010, increasing by over one million patients. Despite rapid growth, our measures indicate that the detection of new episodes of depression (8 %) and minimally appropriate treatment rates (84 %) remained stable, suggesting VA was able to maintain a standard of care while treating significantly more patients each year.

While our measure of treatment completion mirrors standards used in prior clinical quality improvement trials,14, 27 our expert panel judged that future iterations of this measure need to incorporate a higher minimal treatment threshold. This is feasible without changes to the basic measure approach we used. Future efforts to develop measures to identify excellent care should consider, however, the potential tradeoffs if electronic measures become too specific. Tradeoffs can include greater potential for error, lower measure reliability, and greater likelihood of using censored data that does not reflect the care for the full relevant population of interest. Furthermore, measures focused on enhancing access to basic care (e.g., performance of HbA1c or following up detected depression) often have large impacts on patient outcomes. Low threshold measures often identify patients receiving severely inadequate care and thus in most need of intervention.

Follow-up rates increased for both 84-day and 180-day follow-up (18–19 % increase), indicating improvements in timely treatment initiation and follow-up; yet showed lower adherence than did our treatment measure. Substantial evidence indicates that non-face-to-face modalities can effectively substitute for frequent visits but administrative codes for these were not available during our study period. Reliable coding of telephone, tele-health, and secure messaging encounters is now available in the VA, and future measures using our development methodology can feasibly incorporate these follow-up modalities.

The measures development methodology, including patient assignment, identifying new depression episodes, and evaluating treatment over time proved feasible and led to measures that were stable over time and across multiple primary care sites and regions. The counting trees verify and validate the algorithms underlying the measures and systematically reference the full primary care population to avoid errors due to loss of subjects. Additionally, counting trees can be used to make measures more actionable to local sites. The strict attention to the branching algorithm, the multiple data sources, and the timing of measures around an index visit creates programming challenges, yet is critical.

Using administrative data for measure development has limitations. The measures we developed advance electronic measurement by moving beyond detection based purely on ICD-9 codes to consider new antidepressant use.20 Based on ICD-9 codes alone, the detection rate was 1 %, but adding use of antidepressants for depression resulted in detection of an additional 6–7 % of primary care patients. However, the 7–8 % detection rate was still considerably lower than published detection rates based on survey-based depression screening.6 We preliminarily tested incorporating a PHQ-2 based detection approach into our measures (unpublished) using FY2010 data and found it to be feasible. In this approach, screening positive is used to identify the applicable population (denominator) and the cohort is then linked to the quality measures. However, because our project required stable measures spanning FY2000-FY2010, and standardized national data on screening (e.g., PHQ-2 and PHQ-9) was not available until FY2008, we could not incorporate these screening/symptom measures into the measurement algorithms. Finally, we developed an antidepressant algorithm to exclude prescriptions for non-depression indications; however, antidepressants used for other conditions (sleep, pain, migraines, etc.) may not have been entirely excluded. Future attention to promoting coding that indicates non-depression–related antidepressant use could improve measure accuracy.

We developed electronic population-based longitudinal depression quality measures that met reasonable standards as meaningful, feasible and actionable 31 for assessing VA depression care over a decade. Our current data shows that VA improved depression follow-up between FY2000 and FY2010, and that treatment rates compared favorably with non-VA benchmarks. Looking forward, our measure development methodology can feasibly be adapted to incorporate more stringent definitions of treatment, depression symptom screening data, and future enhancements in care. The methodology and techniques we used to address measurement challenges provide a basis for future performance measure development, especially for other chronic conditions where longitudinal care must be captured.