Background

Both major depressive disorder (MDD) and bipolar disorder (BD) are common chronic psychiatric disorders, which usually cause health damage or even severe disability to patients [1]. Accurate diagnosis of MDD and BD, which primarily depends on clinical manifestations, is the basis of individualized treatment and improvement of prognosis. However, the onset of BD is usually dominated by a depressive episode [2] and the clinical features of MDD and BD often overlap, which causes troubles to clinicians in diagnosis. And reducing misdiagnosis is critical to avoid delays of proper therapy or poorer outcomes.

Although BD is a multifactorial disorder with several subtypes, such as bipolar disorder I (BD I) and bipolar disorder II (BD II), some large genome-wide association studies have found that no significant locus identified for BD overlapped with those identified for depression, while all BD subtypes have common variant heritability [3]. This provides a theoretical basis for the potential existence of biomarkers to distinguish between MDD and BD. Emerging studies have shown that some potential indicators may help improve diagnostic accuracy between MDD and certain types of BD, or discriminate different phases of BD, such as biomarkers of individual system including the blood system [4] and the immune system [5],or biomarkers of multiple systems such as inflammation-immune response traits [6, 7], metabolic syndrome components [8, 9], or composites of potential gene or protein biomarkers via laboratory researches [10, 11].

Given the complexity and heterogeneity of etiology of mental diseases, biomarkers of multiple systems are more likely to effectively differentiate between MDD and BD. A composite of indicators from routinely measured examinations can not only reveal functions or status of different systems, but also have the potential to apply to clinical use given its easy accessibility.

In this study, we hypothesized that a panel of biomarkers combining routinely measured indicators might help differentiate between MDD and BD. In accordance, the aim of the study was to construct a preliminary prediction model to distinguish MDD and BD and evaluate its performance.

Methods

Study population

The study population consisted patients admitted to the Fifth Affiliated Hospital of Sun Yat-sen University from January 2019 to December 2021. For cases of repeated hospitalizations, only the first admission was included. Information on qualified cases was extracted from the electronic medical record system after an ethical review by the Ethic Committee of the hospital. As a retrospective clinical study, the requirement for informed consent was exempted and identifiable personal information was removed to protect patient privacy.

Inclusion and exclusion criteria

The study included patients with an ICD-10 diagnosis of bipolar disorders (ICD-code: F31) or depressive disorders (ICD-code: F32 & F33) [12]. Although the ICD-codes include many subcategories that generally depend on the present clinical manifestation, we just used generic diagnosis of MDD and BD. In short, BD was diagnosed if the patient was having either (hypo)mania episode or depressive episode or having mixed or alternating (hypo)mania or depressive symptoms at survey, and had at least one episode of other mood disorder in the past. MDD was diagnosed if the patient was having depressive episode at survey but never had (hypo)mania episodes in the past. In order to avoid possible misdiagnosis, the diagnosis was cross checked by attending physicians, and finally confirmed by the Department Chief. Exclusion criteria were pregnancy, chronic infectious diseases including viral hepatitis and syphilis, autoimmune diseases including hashimoto thyroiditis and asthma, diabetes, malignant tumors or cancers. Therefore, a total of 721 participants were included in the study (supplement Fig. 1).

Data collection

We collected epidemiological data of the 721 participants, including age, gender, duration of the diseases, and marital status. And we also recorded the use of several psychotropic drugs, including antipsychotics, antidepressants, mood stabilizers, and benzodiazepines, in the month preceding the study entry, which we marked as present psychotropic medication use. In addition, results of routine blood tests were also collected. The selected blood tests were performed in the morning on the second hospitalization day, from a forearm vein after at least 10 h of fasting.

Potential predictors selection

Potential predictor selection was primarily performed using traditional statistical methods and machine learning approaches. Firstly, an initial variable screening was performed using univariable analysis [13], and only covariates with a p-value of less than 0.01 were chosen for subsequent analysis. Secondly, predictors were further selected from the above variables using the best subset selection method, via the leaps package (Version 3.1) with complete cases [14]. During the process, we set the maximum size of the subset to eight, which was also the default number in the function. During repeated iterations, information criterions including Mallows’ Cp (CP), and Bayes Information Criteria (BIC) of different subset sizes were demonstrated in plots, all of which help determining the best subset [15]. Thirdly, decision curve analysis (DCA) was used to choose the final model when different information criterion directed to different best subsets [16].

Development of the preliminary prediction model

The preliminary prediction model was developed using logistic regression, and variables were excluded if their coefficients became insignificant after adjusting for the psychotropic medication use. Since variables were selected without considering observations with missing data, we used multiple imputation by chained equations (MICE) to avoid bias or inefficient estimates of parameters [17]. All results of the blood tests, in addition to age and gender, as well as the dichotomous outcome variable were included in the imputation. With the assumption that data were missing at random (MAR), the predictive mean matching (PMM) method was used to impute the missing variables using the mice package (version 3.15.0) in R. Since complete case analysis may introduce bias, we used imputed datasets for consistency checks. If the conclusions drawn from the observatory cases or imputed cases were consistent, we could be confident that the conclusions were reliable.

Model presentation and examination

The preliminary prediction model was presented in the form of a nomogram and its performance, which was assessed in two aspects, discrimination and calibration, was examined using observatory data containing cases without missing values of the selected variables.

Discrimination refers to the ability to distinguish between the two outcomes and can be assessed by concordance statistic (c-statistic). In logistic regression analysis, the c-statistic is equal to the area under the ROC curve (AUC) [10]. The AUC with a higher value indicated higher accuracy. The model would be considered superior to a random ordering model if AUC > 0.5, while AUC value ranging from 0.5 to 0.7 indicate mild performance, 0.7–0.9 indicate moderate performance. In addition, sensitivity, specificity, and the likelihood ratios [18], including the positive likelihood ratio (LR [+]) and the negative likelihood ratio (LR [-]) were calculated to further test the accuracy of the model .

Calibration is used to evaluate the goodness of fit of the prediction model, which was assessed by calibration curves [19], with the final regression model subjected to bootstrapping validation (1,000 bootstrap resamples), via the rms package (Version 6.3-0). In addition, the Hosmer-Lemeshow test was used for testing model fit.

Internal validation was performed using 10-fold cross-validation repeating 10 times [13], via caret package (Version 6.0–93). Moreover, different subsets were used to further validate the model, including the drug naïve subset, and the different age subgroups including 14–29 age group, 30–44 age group, and 45 + age group.

Statistical analysis

Data for continuous variables are presented as mean and standard deviation (SD), skewed data as median (25th and 75th percentiles), and categorical variables as absolute numbers and percentages. Shapiro-Wilk test was used to check whether the continuous variables were normally distributed, and then Levene’s Test was used to assess the homogeneity of variance. Clinical characteristics were compared using Student’s t test for normally distributed variables of equal variance, or Welch T test for normally distributed variables without homogeneity of variance, or Wilcoxon rank sum test for skewed distributed variables, or Pearson’s Chi-squared test, Fisher’s exact test when required, for categorical variables. All statistical analyses were conducted using the freely available statistical software R (version 4.2.0). The reported statistical significance levels were all two-sided, with an alpha value set at 0.05.

Results

Epidemiological and clinical characteristics

In total, 721 patients were included in the current study, 234 in the MDD group and 487 in the BD group. Characteristics of the study population are given in Table 1. There were no statistically significant differences between patients with MDD and BD in gender, duration of illness, and family history of mental disorders, while patients of the two groups had different features in age, marital status, and use rate of antipsychotics, antidepressants, and mood stabilizers (p < 0.01).

Table 1 Epidemiological characteristics of patients with MDD or BP

Notably, 226 (31.34%) participants had different degrees of data missing in the results of blood tests, most of which were concentrated on the examinations of the inflammatory and immune response (Supplement Fig. 2).

Variable selection

With preliminary data screening using univariable analysis, 22 potential biomarkers including age with a p-value < 0.01 were selected for best subset selection (Supplement Table). As demonstrated (Fig. 1), the subset with eight variables showed the smallest CP (Fig. 1A), while the subset with five variables showed the smallest BIC (Fig. 1B). However, for the former model, the regression coefficient for platelet-to-lymphocyte ratio (P.L) was not significant (p ≅ 0.09), which was deleted from the model after verifying that its exclusion did not make a significant statistic difference. Subsequently, DCA clarified that the model with seven variables had moderately greater clinical benefits in general (Fig. 1C), which consisted of age (unit: years), eosinophil count (Eos, unit: 109/L), plasma concentrations of thyroid-stimulating hormone (TSH, unit: uIU/mL), follicle-stimulating hormone (FSH, unit: mIU/mL), prolactin (PRL, unit: ng/mL), total cholesterol (TC, unit: mmol/L), and low-density lipoprotein cholesterol (LDL, unit: mmol/L).

Fig. 1
figure 1

(A-B) Best models for each subset size based on Mallows’ Cp (CP) and Bayes Information Criteria (BIC). (C) Decision curve analysis for the model with 5 variables (sub.fit.5) and the model with 7 variables (sub.fit.7). WBC, white blood cell count; PLT: platelet count; Lym, lymphocyte count; Eos, eosinophil count; P.L, platelet-to-lymphocyte ratio; N.L, neutrophil-to-lymphocyte ratio; E.L, eosinophil-to-lymphocyte ratio; FT3, free triiodothyronine; FT3.FT4, free triiodothyronine-to-free thyroxine ratio; TSH, thyroid stimulating hormone; FSH, follicle stimulating hormone; E2, estradiol; PRL, prolactin; ALT, alanine transaminase; UA, uric acid; LDL, low density lipoprotein cholesterol; TC, total cholesterol; IgA, immunoglobulin A; ALB, albumin

Multiple logistic regression analysis incorporating the 7 selected variables was shown in Table 2. After adjusting for the present psychotropic medication use, the coefficients (β) and odd ratios (or exp(β)) of TSH and FSH became insignificant, which resulted in the deletion of the two variables. The imputed dataset showed consistent results as the complete dataset.

Table 2 Multivariable regression for diagnosis between MDD and BP in patients

Presentation of the preliminary prediction model

The final model incorporating the five potential independent predictors, age, LDL, TC, Eos, and PRL, was presented as a nomogram (Fig. 2).

Fig. 2
figure 2

The nomogram developed in the observatory populations, incorporating age, total cholesterol (TC), and low-density lipoprotein cholesterol (LDL), eosinophil counts (Eos), and prolactin (PRL)

Evaluation of model performance

For the above nomogram, the c-statistic was 0.858, indicating good discrimination (Fig. 3A). Moreover, with a cutoff value of 0.66, the model showed a sensitivity of 0.716 and a specificity of 0.890. Moreover, LR [+] and LR [-] were 6.51 and 0.32, suggesting moderate shifts in probability of a correct diagnosis using the model.

Fig. 3
figure 3

(A) Receiver operator characteristics (ROC) curve for the diagnostic model to distinguish patients with MDD or BD. For logistic regression models, c-statistic is equal to Area Under the ROC Curve (AUC). (B)Calibration curve. The x-axis represents the predicted probability and y-axis represents the actual probability of BD diagnosis. Perfect prediction would correspond to the 45° dashed line, the dotted line represents the observatory cases (n = 700), the solid line is bias-corrected by bootstrapping (B = 1000 repetitions)

The calibration plot indicated that predicted probabilities approximately matched actual probabilities for this model (Fig. 3B). And the Hosmer-Lemeshow test p-value was 0.705, indicating good model fit.

Validation of the preliminary prediction model

The average c-statistic of the repeated cross validation was 0.853 (range from 0.850 to 0.856) (Fig. 4A). This was close to but slightly lower than the overall model c-statistic of 0.858, indicating the stability and reliability of the preliminary predictions within the study population. Moreover, subset validation with ROC curve furtherly confirmed the robustness of the model. In the drug naïve subset, the AUC was 0.826, indicating good discrimination (Fig. 4B). In different age subsets, the AUC ranged from 0.671 to 0.739, indicating mild to moderate discrimination (Fig. 4C1-C3).

Fig. 4
figure 4

(A) Box plot showed the results of average AUC, or c- statistic, on the 100 cross-validation samples (10-fold cross validation repeated 10 times). (B, C1-C3) ROC curve distinguished patients with MDD or BD in different subgroups, including drug naïve group (B) and different age groups (C1-C3), 14–29 age group, 30–44 age group, and 45 + age group respectively

Discussion

After many years of effort, researchers have not yet constructed a prediction model for discriminating between BD and MDD with clinical utility. In the present study, we preliminarily developed and validated a diagnostic nomogram, with a composite of biomarkers from routinely tested blood results, to distinguish MDD and BD. The model was constructed using the best subset selection method and then verified using multiple imputations and adjusted with the inclusion of the psychotropic medication use. The final model consisted of five variables: age, LDL, TC, Eos, and PRL. The model could discriminate between MDD and BD with an AUC of 0.858, with a sensitivity of 0.716 and a specificity of 0.890.

During the construction of the model, 47 features were reduced to 22 potential predictors at the first step by univariant analysis, then the best subset selection method was managed to select seven prominent markers. Of the 721 patients in the study, only 495 without missing data were used for the primary multivariable selections. Then 700 patients were used for adjustment and evaluation of the model after deleting cases with incomplete values of the prominent variables, which made the findings relatively more robust than constructing and validating the model using the same population. Moreover, repeated cross validations were subsequently used to verify the model when the training dataset and test dataset did not overlap, and subset validations were used to test the effectiveness of the model in drug naïve patients and patients of different age groups.

The findings of the present study were somewhat consistent with previous studies. For example, age is one of the most profound distinguishing factors between MDD and BD, as it had been broadly accepted that the onset age of MDD is generally later than that of BD [20, 21]. However, we wanted to see how the performance of the composite biomarkers would change if the effect of age was minimized. The study divided patients into three groups, 14–29 age group, 30–44 age group, and 45 + age group respectively. Within each group, age became insignificant different between MDD and BD patients (data not shown). Unsurprisingly, the model discrimination had varying degrees of deteriorations, and the AUC were 0.688, 0.671, and 0.739 respectively, indicating that the model still had mild to moderate diagnostic efficiency in patients of same age group.

Moreover, eosinophil could also help discriminate the two disorders, which was consistent with previous studies. For example, it has been demonstrated that eosinophil counts were reduced in MDD patients [22], while the increased eosinophil function could be found in the late-stage of BD [23].

In addition, the inclusion of PRL in the model, one of the hormones secreted by pituitary gland, suggested that pituitary function might play a role in differentiating MDD and BD. However, previous studies on the pituitary gland mainly focused on the gland volume changes in mental disorders and their association with hyperactivity in the Hypothalamic-Pituitary-Adrenal axis [24,25,26]. Other hormones provided by pituitary gland besides adrenocorticotropic hormone could also have potential effects on mental disorders. In this study, TSH and FSH were tested statistically significant but excluded after adjustment for psychotropic medication use, which was in accordance with clinical consensus that endocrine is greatly influenced during the drug treatment for affective disorders [27]. Interestingly, PRL remained in the model after medication adjustment. However, these findings require further confirmation in drug-free patients.

In addition, LDL and TC were also included in the final model. These findings did not contradict previous findings that abnormal lipid metabolism was more prevalent in MDD and BD patients compared to health controls [28, 29]. However, few studies have compared the differences in lipid profile distribution between MDD and BD. Our study showed that BD patients had relatively higher LDL levels, while MDD patients had higher TC levels. Although these findings indicated different lipid profiles in MDD and BD patients, but both were consistent with the findings that patients with severe mental illnesses had increased risks for cardiovascular diseases [30, 31].

Moreover, like endocrine functions, lipid metabolism is also seriously affected by some kinds of psychotropic drugs, especially antipsychotics and mood stabilizers, such as clozapine, olanzapine, and valproate [32], which can ultimately result in hyperlipidemia or even obesity. As it was demonstrated in Table 1, the proportion of BD patients using antipsychotics and mood stabilizers was significantly higher than that of MDD patients, however, the coefficients of TC and LDL in the regression model remained significant after the adjustment of medication use in this study, indicating that the pharmacological effect was not the only reason for the differences in the lipid levels between the two groups. In other words, abnormal lipid metabolism may underlie the mental disorders. However, since the cholesterol level can be greatly influenced by living habits, such as diet and physical activities [33], the significance could not be applied to populations with different lifestyles.

Emerging studies have confirmed the potential roles of inflammation or immune-based biomarkers as predictive biomarker panels to differentiate MDD and BD, usually including C-reactive protein (CRP), interleukins, and complement components [5, 34,35,36]. However, the above related potential biomarkers were surprisingly excluded during model development, which was inconsistent with previous findings. For example, Chang et al. demonstrated that baseline CRP could serve as a discrimination biomarker for MDD and bipolar II disorder in drug naïve patients (cutoff value: 621.6 ng/mL; AUC value: 0.816), and patients with baseline CRP greater than 621.6 ng/mL had 28.2 higher odds of bipolar II disorder [37]. However, in our study, CRP level showed no statistical difference between MDD and BD and was excluded at the first step. The possible reason might be treatment effects as indicated by Chang’s work itself: the difference of CRP level would become narrower between MDD and bipolar II disorder after treatment. Another possible reason may be bias from concentrated missing values on inflammation and immune factors; although the multiple imputations had indicated that the missingness of the selected variables in the model was at random, it may not represent the same missing pattern of the potential predictors in question [38].

There were several limitations to this study. Firstly, behavior characteristics and psychological assessments failed to be included in the analysis process. Secondly, BD patients were not specifically classified into different clinical phases including (hypo)manic or depressive phase, mixed episode of BD, or rapid cycling BD. Thirdly, the participants were included when they were at acute phase and blood examination were performed on the second day, the process were limited by clinical practice of the hospital, and the results may need further evaluation with participants in remission. At last, the data were collected from one hospital, the generalizability of the preliminary prediction panel needs further testing with external validation cohort.

Besides exploring the distributional differences of the blood indicators, emerging researches have been investigating the pathophysiology of MDD and BD in multiple molecular levels [39]. As the technology continuously develops and the cost deceases, it could be expected that a valid and convenient composite of biomarkers be constructed by combining biomolecular components and the ordinary clinical indictors, which could effectively distinguish between MDD and BD and also guide precision treatment in the future.

Conclusion

Our study presents a nomogram that incorporates factors from commonly tested blood indicators that could conveniently help distinguishing MDD and BD, and thus reduce misdiagnosis.