INTRODUCTION

Measurement-based care (MBC) is a clinical strategy involving regularly measuring psychological symptom frequency and severity, side effects, and treatment adherence, reviewing measurement trends, and using those findings to inform clinical decision making1,2,3,4. When implemented into primary care5 and specialty mental health care2,3,4,5,6, MBC is associated with detecting treatment non-response, a greater number of changes in or intensification of treatment plan, acceptability among patients and clinicians, and better outcomes3,4,5,6,7. The Joint Commission now requires behavioral health organizations to assess outcomes “through the use of a standardized tool or instrument” (CTS 03.01.09)8, and guidelines recommend MBC in the treatment of individuals with depressive and anxiety disorders3,4,9,10,11.

Less is known about MBC for individuals with bipolar disorder, which includes episodic and/or mixed depressive and manic symptoms12. Because manic and depressive symptoms commonly co-occur in individuals with bipolar disorder13, providing MBC for bipolar disorder would involve assessing both symptom domains, though no patient-reported manic symptom measure is widely adopted. Notably, individuals with bipolar disorder often initially present to primary care, and similar proportions of individuals with bipolar disorder are treated in primary care and specialty settings12,14, making acceptability in primary care a high priority for any measure used in MBC.

Two commonly used research measures, the Altman Mania Rating Scale15 (AMRS) and the Internal State Scale16 (ISS), have limitations as MBC instruments. The AMRS was developed for the inpatient setting and only assesses 5 manic symptoms. Also, inter-item variability in item response sets complicates score interpretation, and because each item includes five complete sentences for response options, it is not conducive to verbal administration. The ISS is more comprehensive and easy to administer. However, it is relatively difficult to score compared to more widely adopted symptom measures like the Patient Health Questionnaire – 99,17 (PHQ-9).

The PHQ-9 assesses depressive symptoms and given its wide adoption, in primary care and specialty mental health care settings, a complement measure that assesses manic symptoms could increase the use of MBC for bipolar disorder. Furthermore, a recent review outlined ten priorities for MBC research including “develop(ing) brief and psychometrically strong measures to be used in combination”4, such as combining a new patient-reported manic symptom measure with the existing PHQ-9.

Because of the PHQ-9 familiarity among clinicians, the need to assess depressive and manic symptoms in bipolar disorder and advantages of patient-reported measures9,18,19, we developed a complementary brief, patient-reported manic symptom measure—the Patient Mania Questionnaire-9 (PMQ-9). We report the development and psychometric properties of the PMQ-9 measure.

METHODS

Setting and Participants

We analyzed data from the Study to Promote Innovation in Rural Integrated Telepsychiatry (SPIRIT) trial, a randomized pragmatic comparative effectiveness study designed for individuals screening positive for bipolar disorder and/or PTSD in 12 Federally Qualified Health Center systems (FQHCs) (24 clinics) in three states20. Eligible participants were adult patients seen in FQHCs not currently prescribed psychotropic medications from a psychiatrist or psychiatric nurse practitioner and who screened positive for PTSD on the PTSD Checklist (PCL-621) (score of ≥14) and/or for bipolar disorder on the Composite International Diagnostic Interview 3.0 (CIDI22) (positive stem question responses and score of ≥8). Figure 1 shows a participant flow diagram for the study.

Figure 1
figure 1

SPIRIT CONSORT diagram.

Participants were randomized to 12 months of treatment with either telepsychiatry collaborative care which included team-based care involving a primary care clinician, care manager and consulting telepsychiatrist, or telehealth referral which included direct care by a telepsychiatrist and a telepsychologist.

The current report includes two samples. Sample A was used in a cross-sectional analysis and included a convenience sample of 114 trial participants agreeing to complete a supplemental survey after the 12-month outcome follow-up to establish test-retest reliability and concurrent validity by administering the PMQ-9 at the beginning and end of the survey. The AMRS15 and the ISS16 were also administered at this time as outcome data collection modified for telephone administration. Sample A (n=114) included individuals with a range of disorders representative of the full trial sample20 (n=1004) including 29 (25.4% of sample A) individuals diagnosed with bipolar disorder by a study psychiatrist.

To establish internal consistency and sensitivity to change, sample B included participants diagnosed with bipolar disorder by a university-based telepsychiatrist and completed the PMQ-9 two or more times during treatment in the trial. To arrive at a diagnosis, telepsychiatrists provided clinical care to patients via interactive video during the 12-month active treatment period and did not use structured interviews in this pragmatic effectiveness trial. Of the 192 patients diagnosed with bipolar disorder23, 179 completed the PMQ-9 at two or more clinic encounters and were included in the psychometric analyses as sample B.

The Institutional Review Boards at the University of Arkansas for Medical Sciences, University of Michigan, and the University of Washington approved the study protocol.

Measurements

At enrollment and prior to randomization, participant demographic and clinical characteristics were assessed using structured telephone or web-based surveys20. Telepsychiatrist-derived patient diagnoses were recorded in the web-based registry. Participants with a bipolar disorder diagnosis were identified by querying the web-based registry.

Patient bipolar disorder symptoms were monitored at clinic visits with PHQ-9 for depressive and PMQ-9 for manic symptoms. Clinicians and patients determined clinic appointment frequency. The PMQ-9 was used by the telepsychiatrists and telepsychologists in the referral arm, and by the care team in the collaborative care arm. The PMQ-9 was given to patients by clinic staff to complete at the encounter. Scores were recorded in a web-based registry24.

Patient Mania Questionnaire-9

Study investigators developed the PMQ-9 during the preparation phase of the clinical trial in 2015. The trial included primary care clinics where the PHQ-9 was already in use. The study team, participating clinics, and stakeholders identified a need for a manic symptom measure fitting into existing clinic workflows, and was easy to administer, score, and interpret. Symptom measures were needed for the trial to support MBC.

Through literature review, investigator discussion, and consultation with bipolar disorder experts, investigators adapted symptoms from DSM 525 into nine patient self-report items. Several PMQ-9 iterations were reviewed by investigators and experts. A final version was completed before enrolling participants in the clinical trial (Table 1). So results could inform clinical decision making, we used preliminary remission and subthreshold criteria as scores of less than 5 and 10, respectively.

Table 1 Patient Mania Questionnaire-9 (PMQ-9) Scale

All items in the PMQ-9 and the PHQ-9 included time frame and stem-phrase format of “Over the past week, how often have you….” Consistent with the PHQ-917, PMQ-9 item responses ranged from 0 to 3 with 0 indicating “not at all,” 1 indicating “several days,” 2 indicating “more than half of days,” and 3 indicating “nearly every day.” Item scores were added so that the total score ranged from 0 to 27 with higher scores representing greater severity.

Data Analysis

Descriptive statistics were analyzed using data from the web-based registry. PMQ-9 and PHQ-9 means and standard deviations were calculated.

Analyses in Sample A

Test-retest reliability was assessed by calculating correlation coefficients comparing PMQ-9 results administered to participants at two different time points, approximately 30 min apart, during the survey. Concurrent validity was assessed by comparing PMQ-9 results to two validated measures of manic symptoms administered during the same survey as the test-retest reliability assessment. The AMRS15 is a 5-item scale used to assess the presence of and/or severity of manic symptoms; scores range from 0 to 20 with higher scores representing worse severity. The ISS16 classifies bipolar disorder mood states and symptom severity using subscales, with the Activation Subscale [ISS-AS]) assessing manic symptom severity. Note that the ISS16 was used to assess concurrent validity both continuously (comparing PMQ-9 scores to ISS-AS scores) and dichotomously (comparing PMQ-9 score across manic and non-manic states defined using ISS threshold scores of ≥155 on the ISS-AS and ≥125 on the ISS Well-Being subscale).

Analyses in Sample B

Based on all clinical administrations of the PMQ-9 in patients with bipolar disorder, item-level internal consistency was evaluated using Cronbach α. We also examined internal consistency of the PHQ-9 in this longitudinal sample. Confirmatory factor analysis of PMQ-9 and PHQ-9 items determined dimensionality of the scales and whether PMQ-9 and PHQ-9 represent independent factors (symptom groups). Two distribution-based methods—the standard error of measurement (SEM) and the standard deviation (SD)—were used to estimate minimally important difference (MID). The SEM is calculated as the standard deviation of the baseline score multiplied by the square root of one minus Cronbach’s α. One to two SEMs and 0.2 to 0.5 SD are considered reasonable ranges for preliminary estimates of a measure’s MID26,27.

Sensitivity to change was assessed by comparing measure scores from first and final clinical encounters and calculating the proportion of participants with each of four mood states classified by PMQ-9 (less than 10, 10 or more) and PHQ-9 (less than 10, 10 or more) scores. We created mood state classifications informed by the DSM525 and ISS classifications16. Classifications included subthreshold symptom burden (PHQ-9 <10, PMQ-9 <10), high depressive and subthreshold manic symptom burden (PHQ-9 >/=10, PMQ-9 <10), subthreshold depressive and high manic symptom burden (PHQ-9 <10, PMQ-9 >/=10), and high depressive and high manic symptom burden (PHQ-9 >/=10, PMQ-9 >/=10).

RESULTS

Participants

The baseline demographic and clinical characteristics of the study sample are shown in Table 2. Participant scores on the Veterans RAND 12-item Health Survey Mental Health Composite and Physical Health Composite28 indicated mental health quality of life was 2.5 standard deviations below the national mean.

Table 2 Baseline Survey Demographic and Clinical Characteristics of Samples

Descriptive statistics of PMQ-9 and relationship to PHQ-9

The PMQ-9 was completed at 1511 clinical encounters across the 179 patients with bipolar disorder. The mean PMQ-9 score at first and last clinical encounters were 14.5 (SD 6.5) and 10.1 (SD 7.0), a 27% decrease in mean score during treatment in the clinical trial. Mean PHQ-9 scores were similar, with first mean 16.6 (SD 5.8), final mean 12.3 (SD 7.2), and percentage change 24%. A PMQ-9 score of less than 5 at the final measurement occurred in 25% of the sample compared to 18% for the PHQ-9. Approximately 35% of the sample reported a 50% or greater reduction in PMQ-9 score from first to the last score compared to 35% for the PHQ-9. Among individuals with a 50% or greater reduction in PHQ-9 score from first to the last encounter, the odds of a 50% or greater reduction in PMQ-9 score was 7.9 (95% CI 3.8 – 16.3), indicating that changes in the scores of the PMQ-9 and the PHQ-9 were positively correlated.

Psychometrics

Results in Sample A

The Pearson correlation coefficient for test-retest reliability was 0.85 (p<0.0001). The Pearson correlation coefficient for concurrent validity compared to the ISS-Activation Subscale16 was 0.70 (p<0.0001) and compared to the AMRS15 was 0.26 (p=0.007). Individuals demonstrating a current hypomanic or manic state as classified by the ISS had a mean PMQ-9 score of 14.9 (SD 4.2) (n=17), compared to 9.9 (SD 6.6) (n=93) in those not demonstrating a current hypomanic or manic state (t (df, 108) = -3.03, p=0.003).

Results in Sample B

Internal consistency and factor analysis of the PMQ-9 and PHQ-9 showed high and similar reliability of the PMQ-9 (Cronbach’s alpha = 0.88) and the PHQ-9 (Cronbach’s alpha = 0.88). Factor analysis of the PMQ-9 and the PHQ-9 instruments together showed two factors that explained 55% of the item variance and had loadings of 0.40 or greater on their respective factors (Table 3). All 9 items on the PHQ-9 loaded on a single factor and had minor loadings on the second factor. The 9 PMQ-9 items had their primary factor loadings on the second factor. There were four PMQ-9 items that loaded ≥ 0.40 on the depression component one and two PHQ-9 items that loaded ≥ 0.40 on the mania component 2. In general, however, factor loadings do indicate largely uni-dimensional depression and mania factors with some cross-loading of several symptoms.

Table 3 Factor Analysis of the Patient Health Questionnaire-9 (PHQ-9) and the Patient Mania Questionnaire-9 (PMQ-9)

One and two SEMs for the PMQ-9 were 2.25 and 4.50, respectively, and 0.2, 0.35, and 0.50 standard deviations were 1.30, 2.28, and 3.25. Thus, a preliminary point estimate of the MID using distribution-based approaches would be around 3 points, with a range of 2 to 4.

The first and last mood states according to PMQ-9 and PHQ-9 classifications defined above are shown in Figure 2. The distribution of participants in each of the four mood states differed significantly from first to last symptom measurement (X2(9) = 26.69, p=.002). The proportion of individuals with high depressive and high manic symptom burden (PHQ-9 ≥10, PMQ-9 ≥10) on both measures decreased, and the proportion of individuals with subthreshold depressive and subthreshold manic symptom burden (PHQ-9 <10, PMQ-9 <10) on both measures increased.

Figure 2
figure 2

Proportion of patients who had low and high depressive and manic symptoms at first and final assessment during treatment. High depressive symptoms were defined as a PHQ-9 score ≥ 10, and high manic symptoms were defined as PMQ-9 score ≥ 10.

DISCUSSION

We developed a novel patient-reported manic symptom measure (PMQ-9) that is feasible to complete and score during primary care and mental health referral visits (in primary care) and was used regularly across 12 healthcare settings in a large pragmatic clinical trial. The PMQ-9 showed excellent psychometric properties in two analytic samples. Factor analysis confirmed that the PMQ-9 and PHQ-9 represented for the most part independent constructs. The use of both measures in tandem may be an efficient way to monitor mood symptoms of bipolar disorder.

Evidence and recommendations for measurement-based care have grown in recent years3,4,29. Treatment guideline authors have recommended the use of bipolar disorder symptom measures to monitor treatment response30,31. Additionally, the large-scale STEP-BD study involving patients with bipolar disorder demonstrated feasibility of using clinician-observed measures to inform treatment decisions1, and that the use of MBC was associated with few occurrences of treatment inertia32.

However, questions remain about implementing MBC for bipolar disorder treatment, including which symptom measure to use in which setting, and across settings. Our current results combined with findings from two systematic reviews18,33 of bipolar disorder symptom measures can inform this decision. Although it was not required, the PMQ-9 was broadly used in primary care in this large pragmatic trial showing acceptability to patients and clinicians.

A recent systematic review33 of patient-reported manic symptom measures found the most extensively studied measures are the Internal State Scale (ISS)16, the Altman Mania Rating Scale (AMRS)15, and the Self-Report Manic Inventory34. Our study found adequate to excellent concurrent validity of the PMQ-9 compared to the ISS-AS and the AMRS (the Self-Report Manic Inventory was not evaluated in our study). The lower correlation of 0.26 between the AMRS and the PMQ-9 was similar to a reported correlation of 0.16 between the AMRS and the ISS35. It is likely that the differences in the purpose of the scales (the AMRS for assessing acute manic symptoms in hospitalized individuals, ISS and PMQ-9 for monitoring treatment over time) account for the higher correlation between the ISS and PMQ-9 compared to correlations with the AMRS. Additionally, the AMRS is intended to differentiate individuals with mania from those without mania35, while the ISS and PMQ-9 are intended to monitor a wider range of manic symptom severity over time. The favorable psychometrics of the PMQ-9 compared to two of the most studied manic symptom measures also support the use of PMQ-9.

The PMQ-9 is a comprehensive measure assessing a range of manic symptoms occurring throughout the course of bipolar disorder, including during periods of subsyndromal manic symptoms, combined manic and depressive symptoms, and concurrently during depressive episodes, all of which are symptom experiences occurring more often than a full manic episode36. This contrasts with the AMRS15 which assesses symptom severity during full manic episodes and does not assess distractibility or faster thinking, which are two manic symptoms occurring commonly during bipolar depression13.

Broad adoption of MBC for bipolar disorder in general primary and mental healthcare settings, where most patients with bipolar disorder present for care, will require having options for bipolar disorder symptom measures that are acceptable to clinicians and patients, easily interpretable, with sound psychometrics and feasibility to be used longitudinally. Our current results show the PMQ-9 was widely used (concurrently with the PHQ-9) and was acceptable, has favorable psychometric properties and a distinct use from existing measures, suggesting the PMQ-9 combined with the PHQ-9 may be a good candidate to monitor bipolar disorder treatment in primary care and mental health care clinical settings.

The use of the PMQ-9 across settings could help patients and clinicians compare current to past clinical status based on symptom scores and facilitate efficient communication between primary and specialty mental health clinicians. Reports from collaborative care programs which are increasingly used to care for patients with common mental disorders have shown that even though screening protocols are designed to detect patients with depression and anxiety, clinicians often encounter patients with bipolar disorder37,38. Concurrent use of manic and depressive symptom measures for individuals with bipolar disorder may help collaborative care teams monitor and adjust treatment more efficiently and effectively. Indirect care models39 such as e-consults may also use such measures to help clinicians describe clinical status to psychiatric consultants.

Limitations

Concurrent validity was assessed using versions modified for telephone administration of the validated measures the AMRS15 and the ISS16, potentially affecting their psychometric properties. Test-retest reliability and concurrent validity were assessed in a relatively small cross-sectional sample (sample A). Th frequency of PMQ-9 administration varied for participants in the longitudinal sample (sample B). Data are lacking on how clinicians and patients used the PMQ-9 to inform treatment decisions. Cut-offs for symptom severity were determined based on clinical judgment coupled with parallelism with the PHQ-9 and should be further evaluated with additional measures of construct and criterion validity. Additionally, determining operating characteristics (i.e., sensitivity and specificity) of the PMQ-9 to identify (hypo)manic episodes would require the administration to a diverse sample of patients also evaluated by a structured psychiatric interview conducted by a rater masked to PMQ-9 results.

Conclusion

The PMQ-9 demonstrated excellent test-retest reliability, concurrent validity, internal consistency, and sensitivity to change and was acceptable to patients and clinicians in a pragmatic clinical trial. Combined with the PHQ-917, this brief measure could inform MBC for individuals with bipolar disorder in primary care and mental health care settings given its ease of administration and familiar self-report response format. The next steps include evaluating if the PMQ-9 facilitates and promotes the uptake of MBC for bipolar disorder, especially in primary care, and whether the use of MBC for bipolar disorder is associated with addressing treatment inertia and improving outcomes.