Background

Almost all patients with bipolar disorder (BD) require maintenance pharmacotherapy to prevent subsequent episodes, complications, and residual disabilities even after acute episodes (Grunze et al. 2018; Yatham et al. 2018). Lithium and valproate are commonly used for maintenance treatment of BD, and several newer drugs have been added to the list of first-line recommendations (Miura et al. 2014; Goodwin et al. 2016; Yatham et al. 2018). However, the clinical and biological correlates of long-term treatment response to individual mood stabilizers are not well-defined (Pisanu et al. 2018).

In the last decade, the feasibility of large-scale pharmacogenomic studies has rapidly increased, and the international Consortium on Lithium Genetics (ConLiGen) and other groups have tried to identify the genetic basis of responsiveness to mood stabilizers (Manchia et al. 2013; Rybakowski 2013; Hou et al. 2016; Zhu et al. 2017). Considering the recurrent and biphasic course of BD, exploration of the genetic and clinical factors associated with maintenance therapy requires long-term observation. Therefore, retrospective measurement in naturalistic clinical settings is inevitable. In addition, the need to incorporate clinical correlates into pharmacogenetic analyses has been recognized (Lin et al. 2018).

The most widely adopted measurement tool for long-term treatment response to mood stabilizers in BD is the Alda scale (Manchia et al. 2013). It retrospectively measures global treatment outcome (A score) and several confounding factors that affect treatment outcome (B score). The total score is a composite score acquired by subtracting B from A (Grof et al. 2002). Recently, it has been suggested that the Alda scale requires modification. Scott et al. (Scott et al. 2017, 2019) proposed a validity issue regarding use of a single scoring system for the B score for multiple heterogenous aspects of confounders. They suggested a machine-learning approch to overcome this problem. Other researchers used global outcome scores (A score) in subjects with comparable conditions in terms of confounding factors, i.e., among low B scorers (Lee et al. 2011; Chen et al. 2016; Hou et al. 2016). In addition, whether the effects of the numbers and frequencies of previous episodes before starting the index medication (reflected in B1 and B2) need to be controlled as confounding factors or included as target candidates of outcome predictors depends on the study purpose and design. If the study aims to explore predictors or correlates of drug response, the previous course needs to be included in the analysis as a main independent variable. In that case, the response measurement system of the Alda scale needs to be modified.

Another important issue to be considered when assessing the long-term response to mood stabilizers is the enrollment of patients who are treated with mood stabilizer combination therapy. Until recently, most pharmacological studies measuring long-term treatment response to specific mood stabilizers recruited patients undergoing monotherapy (Gyulai et al. 2003; Pfennig et al. 2010; Manchia et al. 2013; Sportiche et al. 2017). In clinical practice, however, concomitant use of more than one mood stabilizer happens frequently, especially in patients with poor treatment response (Baek et al. 2014). Therefore, excluding those patients might generate a skewed distribution of subjects in terms of drug response, which results in a decrease in statistical power in clinical and biomarker studies. In a previous study by some of the present authors (Ahn et al. 2017), we tried to identify clinical correlates of long-term mood stabilizer response in patients taking lithium and/or valproate. In that study, we used the Alda scale and treated the lithium and valproate combination as a single index medication, and identified overall response correlates for the two drugs.

This study aimed to compare long-term treatment response among groups of patients treated with lithium and valproate. In measuring treatment response using the Alda scale, we included patients receiving mood stabilizer combination therapy and intended to explore whether the Alda scale is applicable in subjects under mood stabilizer combination therapy. We also explored the clinical correlates of the response to each drug.

Methods

Subjects

Patients who met DSM-IV criteria for bipolar I (BD-I) or bipolar II (BD-II) disorder and who had received treatment with lithium and/or valproate for more than 2 years between March 2009 and September 2017 at the Bipolar Disorder Clinic of Samsung Medical Center, a tertiary-care university-affiliated hospital, were included in the study. Two drugs, lithium and valproate, were the index medications for the study, and users of both drugs were included in the sample population. The lithium and valproate combination group included patients who had received combination therapy for more than 2 years. The best treatment was provided to each patient based on treatment guidelines (Grunze et al. 2013; Yatham et al. 2018), clinicians’ experience, and patients’ special concerns regarding adverse effects of the drug. We limited the age range of the subjects to between 18 and 55 years because older patients with bipolar disorder could have distinct clinical courses and treatment responses (Sajatovic et al. 2015). Those who showed evidence of neurologic disorders or general medical conditions related to mental symptoms were excluded. A total of 102 patients who met the above criteria and agreed to participate were enrolled in the study. All subjects had previously participated in the authors’ clinical and genetic studies (Baek et al. 2011, 2016, 2019; Yang et al. 2015; Ahn et al. 2017), and 80 of them were involved in the previous study on long-term response to mood stabilizers (Ahn et al. 2017). This study was approved by the Institutional Review Board of Samsung Medical Center.

Assessment of clinical characteristics

Comprehensive disease characteristics had been evaluated in previous studies described elsewhere (Baek et al. 2011, 2016, 2019; Yang et al. 2015; Ahn et al. 2017). Each evaluation was carried out by a direct interview using the revised version of the Korean version of the Diagnostic Interview for Genetic Studies (Joo et al. 2004) or the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID). Basic demographic and clinical characteristics, clinical course of BD, and manifested symptoms of manic and depressive episodes on a lifetime basis were evaluated (Table 1).

Table 1 Comparison of baseline characteristics between medication groups

Frequent experience of episodes was defined as having mood episodes, either depressive or (hypo)manic episodes, more than once per year after illness onset; we defined experience of the frequent episode based on frequency of the episode > 25%ile of the total patient sample calculated from our previous study with larger sample sizes (Baek et al. 2019). The polarity of onset was defined as the type of the first episode, where 0 denoted depressive and 1 denoted (hypo)manic. Rapid cycling and mixed episodes were defined following the DSM-IV-TR criteria. We measured seasonality using the Seasonal Pattern Assessment Questionnaire (SPAQ) (Rosenthal et al. 1987). We defined seasonality as having SPAQ-defined subsyndromal or syndromal seasonal affective disorder. Psychotic symptoms were defined as the experience of psychotic symptoms at least once in the patient’s lifetime. Details of symptoms during the most severe (hypo)manic or depressive episode were evaluated using the definition of each symptom criterion in the DSM (Association, AP 2013).

Assessment of treatment response

Long-term response to treatment was evaluated through retrospective reviews of clinical records. When possible, additional information was obtained directly from the patients during their visits to the outpatient department. Assessments were performed using the Alda scale (Grof et al. 2002) and the overall section of item III of the CGI-BP (CGI-BP-III-O) (Spearing et al. 1997) at the same time. Treatment response in the combination group was evaluated based on the period when the patients received combination therapy.

The Alda scale consists of two rating sections. The Alda A score measures the degree of improvement during the course of treatment, from 0 to 10. The Alda B score evaluates confounding factors that affect the outcome independently of the medication (Criterion B) (Grof et al. 2002). The B1 and B2 parameters measure the number and frequency of previous episodes, while B3 and B4 measure whether subjects received medication for a sufficient period of time and were adequately compliant. B5 assesses the usage of additional medications to improve symptoms. The total Alda score ranges from 0 to 10 and is obtained by subtracting score B from score A. A higher Alda score indicates a better response to treatment. Additionally, we classified participants into two response groups (‘good/moderate responders’ vs. ‘poor responders’). The best-fit theoretical model of two component with the cut-off point at a total score of 4.5 was set by frequentist mixture analysis in our previous study (Ahn et al. 2017). Therefore, a total score 5 or higher was defined as a good/moderate response and a score of 4 or lower was defined as a poor response.

The CGI-BP-III-O measures the degree of symptom improvement from the worst phase of illness prior to the initiation of the index medication; it ranges from 1 (very much improved) to 4 (no change) to 7 (very much worse).

Two research psychiatrists and the clinician who treated each patient (KSH, JHB, SWA, S-YY, and JL) independently reviewed the hospital records and had meetings to determine the consensus score on the treatment response.

Statistical analysis

We divided the subjects into three groups, the lithium monotherapy group (lithium group), the valproate monotherapy group (valproate group), and the lithium and valproate combination group (combination group), in order to compare participants’ baseline characteristics and long-term treatment response. Comparisons of demographic and clinical variables and treatment response between the three medication groups were performed using the chi-square test (or Fisher’s exact test) for categorical variables and analysis of variance (ANOVA) for continuous variables. Post-hoc analyses were done using the Bonferroni method for categorical variables and Tukey’s test for continuous variables. The Pearson correlation coefficient was calculated to measure the correlation between the scores of treatment response measures.

We conducted simple regression analyses to explore the clinical correlates of the long-term treatment response to each drug. We performed these analyses excluding the combination group. The Alda total score was entered as a dependent variable, and each clinical factor was separately entered as an independent variable.

Probability (p) values less than 0.05 were considered statistically significant. All statistical analyses were done with IBM SPSS version 25.0.

Results

Baseline characteristics of the study subjects

Among all 102 subjects, 29 (28.4%) received lithium, 56 (54.9%) received valproate, and 17 (16.7%) received both lithium and valproate (combination group). Eighty-seven (85.3%) had BD-I and fifteen (14.7%) had BD-II. The mean duration of medication was 95.4 months (standard deviation: 70.9, range: 24–192). All subjects had received the index medication for two or more years (B3 = 0), and most of them showed adequate compliance, i.e., B4 = 0 or 1 in 90.2% of the subjects. Table 1 shows the detailed baseline characteristics of the participants. No significant differences were observed among the groups in terms of sociodemographic and disease characteristics including psychiatric comorbid conditions, seasonality, presence of psychotic symptoms, and mixed episodes. Of the manic symptoms measured, decreased need for sleep was more frequently observed in the valproate group compared to the lithium group (post-hoc p = 0.033) and the combination group (post-hoc p = 0.047). No significant differences were observed among groups regarding depressive symptoms.

Comparison of treatment response among the three medication groups

Significant differences of treatment response were observed among the three medication groups for all the response measurements, and the combination group showed the worst response (Table 2).

Table 2 Comparison of treatment responses between medication groups

The combination group showed a greater CGI-BP-III-O score compared to the lithium group (post-hoc p = 0.019), and the valproate group (post-hoc p = 0.003). In addition, the combination group showed a lower Alda A score (post-hoc p = 0.002) and total score (post-hoc p < 0.001) in comparison with the valproate group. The valproate group showed a greater total Alda score compared to the lithium group (post-hoc p = 0.038). When classifying patients into two groups depending on the cut-off score set from our previous study (good/moderate responder vs. poor responder), the combination group had the highest rate of ‘poor responder’ with a statistically significant difference compared to the valproate group (post-hoc p = 0.023).

The three treatment outcome measures were highly correlated with one another (Additional file 1: Table S1).

Figure 1 illustrates the comparative distributions of the Alda scale scores between the combination group and the monotherapy groups. The combination group generally had lower Alda A and total scores compared to the monotherapy groups.

Fig. 1
figure 1

Distribution of Alda scale scores in the monotherapy and combination groups. The x-axis illustrates the scores of the Alda and the y-axis illustrates percentage of patients in each group. a Alda A scores of the lithium users. b Total Alda scores of lithium users. c Alda A scores of valproate users. d Total Alda scores of valproate users

We additionally compared treatment response in the subset of subjects with BD-I (n = 87). These analyses showed similar results to those found in all subjects (Table 2). However, no significant difference was detected in terms of CGI-BP-III-O score.

Previous treatments of the combination group

We reviewed details of previous treatment histories of the combination group (Table 3). We explored if treatment regimens recommended for maintenance therapy in the CANMAT-ISBD guideline (Yatham et al. 2018) was administered on each patient. In three patients (patient number 1, 3, and 9 in Table 3), lithium and valproate combination therapy was tried as the second strategy (3/17 = 17.6%). In others, it was tried as the third or later treatment strategy (14/17 = 82.4%). Valproate and atypical antipsychotics combination therapy was the most commonly applied strategy (13/17 = 76.5%) before the current medications, followed by lithium and atypical antipsychotics combination therapy (10/17 = 58.8%). Antidepressant was tried in only one patient to alleviate obsessive compulsive symptoms (patient number 14 in Table 3).

Table 3 Summary of previous and current medications for the maintenance therapy in the combination group

Clinical correlates of treatment response for each drug

Table 4 and Additional file 1: Table S2 display the results of a simple regression analysis of total Alda score. In lithium users, older age at onset and (hypo) manic episode at onset showed significant positive associations with total Alda score. The presence of mixed episodes and comorbid anxiety disorders or obsessive–compulsive disorder showed a significant negative association with the total Alda score in valproate users (Table 4).

Table 4 Clinical correlates of long-term treatment response in bipolar disorder patients: Results of simple regression analyses using the total Alda score as a dependent variable

Among symptoms of depressive or manic episodes, increased appetite during depressive episodes was significantly associated with total Alda score in valproate users (Additional file 1: Table S2).

Discussion

In this study, we evaluated long-term response to maintenance treatment with specific mood stabilizers including subjects under mood stabilizer combination therapy. We observed that patients who received combination therapy showed poorer treatment response compared to those who received monotherapy. We also identified distinct clinical correlates of response to lithium and valproate.

The initial Alda scale (Grof et al. 2002) stated that the systematic use of antidepressants, antipsychotics, or additional mood stabilizers should be given a score of 2 on item B5 in order to suggest that the link between improvement and a specific treatment is less certain. However, in ConLiGen projects, item B5 was modified to allow the systematic use of antidepressant or antipsychotic medication only (Schulze et al. 2010). Thus, subjects under mood stabilizer combination therapy were excluded from the beginning. Furthermore, a recent study by Scott and colleagues (Scott et al. 2019) even suggested that subjects under combination therapy with antidepressant/ antipsychotics (high B5 scorers) need to be excluded at first. However, combination treatment for BD is quite pervasive and rapidly increasing (Greil et al. 2012; Baek et al. 2014; Fung et al. 2019). If we were to exclude all patients under combination treatment, a limited number of patients could be included in subjects.

Of combination treatment, the mood stabilizer combination strategy occupies a unique position in the treatment of BD. Lithium and valproate are both first-line treatment agents used for maintenance treatment of BD. But the combination of two first-line treatment agents is generally recommended when a patient does not show a satisfactory response to a single first-line agent (Yatham et al. 2018). Thus, patients given mood stabilizer combination treatment are more likely to be poor treatment responders. If we exclude all patients under mood stabilizer combination treatment, we will be disregarding the characteristics of many poor treatment responders in the analysis of pharmacological and pharmacogenetic factors.

In line with the preceding discussion, the combination group showed worse long-term treatment response compared to both monotherapy groups (Fig. 1 and Table 2). The mood stabilizer combination group had the lowest mean Alda A score, indicating that their low treatment response is not derived from high B scores associated with combination treatment. No significant difference was observed between the combination group and the monotherapy groups (either the lithium or the valproate group) in terms of baseline illness characteristics indicating that the difference in long-term treatment response is not derived from baseline illness characteristics. A few studies explored long-term treatment responses of combination therapy in comparison to mood stabilizer monotherapy. The BALANCE study (Geddes et al. 2010) compared the long-term treatment responses of mood stabilizer monotherapy and combination therapy; they reported greater decrease of recurrences in combination therapy. In that study, patients were randomly allocated into monotherapy vs. combination therapy groups. By contrast, in our study, the combination of lithium and valproate was chosen for individual patients through their treatment processes. Therefore, most of our subjects in the combination group had experienced trials of various other treatment regimens before the current medications, indicating inadequate responses to previous treatments including lithium and/or valproate monotherapies (Table 3). In another study, Musetti et al. (Musetti et al. 2018) compared monotherapy and combination therapy groups in a naturalistic setting as in the current study. In that study, current regimens of the combination group were chosen after some other trials including mood stabilizer monotherapy. As a result, combination group had greater episode frequency previously, and the reduction rate of recurrence was higher due to worse previous illness course. So in some way, the findings from the study by Musetti et al. corroborates with our study findings showing that poor treatment responders were allocated into combination therapy group.

Systematic meta-analysis and post-hoc analysis of randomized controlled trials found no significant differences between lithium and valproate in terms of long-term efficacy (Cipriani et al. 2013; Kang et al. 2020). In contrast, observational studies (Garnham et al. 2007; Kessing et al. 2018), a randomized open trial (Geddes et al. 2010) and a population-based cohort study (Hayes et al. 2016) reported the superiority of lithium as a monotherapy agent for maintenance treatment of BD. Our study did not find significant differences in terms of treatment response between lithium and valproate. It is also unknown whether there are distinct clinical correlates of treatment response depending on type of mood stabilizer. Considering that lithium and valproate have distinct neurobiological targets (Chiu et al. 2013), different clinical factors could contribute to the long-term response to each medication.

Prior studies on the clinical correlates of the long-term effects of mood stabilizers have more focused on lithium. A recent meta-analysis (Hui et al. 2019) reported that mania-depression-interval sequence (compared to depression-mania sequence), absence of rapid cycling, absence of psychotic symptoms, shorter pre-lithium illness duration, family history of bipolar disorder, and later illness onset were associated with better long-term treatment response to lithium. In our study, older age at onset and (hypo)manic episode at onset were significantly associated with long-term treatment response. Other factors that showed associations in the meta-analysis did not show significant associations in our study.

Regarding clinical correlates of valproate response, Gyulai et al. (Gyulai et al. 2003) reported that worse depression symptoms, rapid cycling, and comorbid alcohol use disorders were associated with the efficacy of valproate in a maintenance treatment. In addition, a study by Garnham et al. (Garnham et al. 2007) reported psychosis was associated with treatment response to valproate. In our study, comorbid anxiety disorder (Otto et al. 2006), obsessive–compulsive disorder (Shashidhara et al. 2015) and mixed episode (McIntyre et al. 2013; Young and Eberhard 2015), which are commonly known to be associated with poorer treatment responses, showed significant association with treatment response to valproate.

Several limitations of this study should be considered. First, the sample size is small. Subgroup analysis of BD-II patients could not be performed due to the limited sample size. For the same reason, evaluation of clinical correlates in the combination group was not possible. Second, this study measured treatment response retrospectively. However, as previously stated, retrospective evaluation is inevitable when evaluating long-term treatment response. We tried to carry out a comprehensive evaluation using diverse sources of information when determining long-term treatment response. Third, each patient’s treatment strategy can be affected their treating physicians’ preference. The clinicians’ preference can indirectly impact on clinical correlates of the treatment responses of each drug. Finally, the participants were recruited from a single tertiary academic teaching hospital, it may be difficult to generalize the study’s results.

Notwithstanding these limitations, this study has some strengths. As a naturalistic observational study, it may reflect the situation in real-world clinical practice. In addition, baseline disease characteristics were independently assessed in previous studies by the same authors before assessment of treatment response. Finally, even though the assessment of treatment response was retrospective, it was reliable, as the subjects were followed up for more than two years at a single institution.

Conclusions

In this study, we took a novel approach to evaluating long-term response to mood stabilizers. By applying the Alda scale with minor modifications, we measured the responses to lithium and valproate including subjects under combination therapy. The combination treatment group showed poor treatment outcomes. We identified several factors related to the long-term treatment response to each drug. Further efforts are warranted towards the development of measurement methods and study designs that can handle the effects of combination treatment in clinical and genetic studies of BD.