Introduction

Implementing secondary prevention methods with suicidal patients is one of the best public health tools we have to actually avert suicidal outcomes1,2. Suicidal patients are often assessed in the Emergency Room (ER) or outpatient clinics, and most of them are not hospitalized. Psychiatrists and psychologists modulate clinical care depending on the assessments made in clinical settings, for instance, increasing or reducing the frequency of consultations or adapting the pharmacological treatment, but we have only indirect information about the situation of the patients in their daily life. This situation has changed in the last decade with the widespread diffusion of smartphones, which allows the use of Ecological Momentary Assessment (EMA) to monitor suicidal patients routinely. EMA follow-up could eventually lead to timely Ecological Momentary Interventions (EMI) and boost prevention efforts. A good example was provided by Wang et al. in a prognostic study with 83 inpatients that completed EMA surveys of Suicidal Ideation (SI) several times per day during their hospitalization3. The real-time data of SI allowed them to predict with great accuracy the risk of post-discharge suicide attempts. However, EMA studies in suicidology have been neglected until recently4 and those that exist are limited in their sample size or duration5, mainly due to the fatigue of the patients with the use of the assessment apps and the sensitivity of the questions they answer6.

Although ecological studies of suicidal patients focused first on the intensity and duration of suicidal ideas, the finding of sharp changes in very short periods pushed the interest in measuring their variability7. One of the first studies in this area used paper-and-pencil methods and asked university students to fill up a daily battery of questionnaires for 4 weeks7,8. Their results showed an association between SI variability and a previous history of suicide attempts and suggested that SI variability in multiple attempters was independent of their mood. More recent work, based on EMA, suggests that SI variability could be a marker of risk during the follow-up of suicidal patients3,9,10. For instance, the predictive accuracy of Wang et al.’s model, which is described above, was improved with dynamic data on SI changes3. In another paper, Oquendo et al. examined 6 weeks of discontinued EMA data of 51 depressed patients and found stable levels of SI variability for each patient over 2 years9. They also found that high variability could increase the propensity to experience SI when exposed to stressful life events. In line with this finding, impulsivity traits were associated with SI variability, but not SI intensity, in another EMA study that followed up 84 depressed patients for 10 days10.

SI variability may be associated with a particular patient profile characterized by mood or affective instability, which is defined by sudden and recurrent changes in affect. For instance, the level of affective instability predicted SI variability, independently of depression severity, in a sample of female patients with borderline personality disorder11. Affective instability has also been associated with peaks of SI in other samples such as working women12, and patients suffering from depression and anxiety13, bipolar disorder14, or psychosis15.

In this paper, we have analyzed the data of the Smartcrises study16, the largest and longest EMA study with suicidal patients to date. The EMA questions in the Smartcrises study focus on three factors that have been related to the emergence of SI (namely sleep, appetite, and social connectedness), as well as SI and suicidal risk. The choice of sleep, appetite, and social connectedness was based on their potential use as behavioral markers of depression and suicide risk, independently of cultural determinants. Sleep disturbances, particularly insomnia and nightmares, are associated with an increased risk of suicidal behavior (including SI, attempts, and suicide) across diagnoses, cultures, and age groups17. Changes in social connectedness caused by acute psychosocial stress, such as the experience of social exclusion, can act as a trigger for suicidal behavior and induce specific biological and psychological changes18. Appetite variations have been also associated with depressive symptomatology and suicidal behavior19,20.

The aim of this paper was to study the clinical profile of suicidal patients according to the variability of their EMA responses. Consequently, we clustered them according to EMA variability during the follow-up and then compared the demographic and clinical features between the clusters. We hypothesized that high variability would be associated with features of clinical severity, such as depression severity or clinical events during follow-up. The results could help to improve the design of future EMI for suicide prevention. An overview of the study is provided in Fig. 1.

Figure 1
figure 1

Overview of the study. Notice that the main aim of this paper is to find and analyze which clinical and demographic features could be associated with each variability profile. The candidate set of clinical and demographic features (whose possible association to the variability profiles was analyzed) was obtained upon inclusion and at the end of the follow-up. Hence, some longitudinal features were considered by computing the change at those discrete time instances.

Methods

Sample

This study analyzes a set of 275 out of 419 patients from the Smartcrises study16 that complies with all of the following requirements: (1) a complete diagnostic assessment with Mini International Neuropsychiatric Interview (MINI) version 7.0.2 is available; (2) the age and gender of the patients are known; and (3) the EMA variability can be computed in at least one domain, i.e. the participant has answered at least to three prompts in a domain (notice that EMA variability and domains are formally defined in the “Statistical analysis” section, see also Details of the patient selection process in Supplementary Material). Compared to included patients, non-included patients were more likely to be older and retired. They also showed smaller decreases in depressive symptomatology measured by the Inventory of Depressive Symptomatology (IDS), from baseline to the end of follow-up, and higher scores of psychological pain at inclusion (P values < 0.05). None of these differences persisted after correction for multiple comparisons. Refer to the Supplementary Material for the details about the patient selection process; for instance, Fig. S3 shows a Venn diagram with numbers and reasons for exclusion.

Patients were recruited from outpatient and emergency psychiatric departments in five clinical centers across Spain and France. EMA was performed with the MeMind Wellness Tracker app (available at the App Store and Google Play) as software, and the participants’ smartphones as hardware. The language of the questionnaires and the mobile application, which included all assessment scales and EMA questions, was adapted to the country. Relevant inclusion criteria include being 18 years old or older, and a clinical assessment due to a suicidal crisis in the last 7 days. Patients diagnosed with a current manic, hypomanic or mixed affective episode, or any lifetime psychotic disorder were excluded. Patients with bipolar disorder could be included only in the depressive phases of the illness and in the absence of any manic or mixed symptomatology. Signed written consent (i.e., informed word) was obtained from all participants. Ethics and privacy regulation, including, but not limited to the Declaration of Helsinki21, was followed, and all methods were carried out in accordance with relevant guidelines and regulations. The study protocol was approved by the Institutional Review Board (IRB) of Fundación Jiménez Díaz Hospital, Spain, on the 25th of June 2017 (LSRG-1-005-16), and by the Comité de Protection des Personnes Ouest IV, France, on the 3rd of July 2018 (20187-A02634-49). The study is registered as a clinical trial since October 2018 (ClinicalTrials.gov Identifier: NCT03720730).

Study design

The Smartcrises study combines validated clinical scales with EMA to assess the health condition of the patients. The battery of self-reported and clinician-reported questionnaires was administered in clinical visits upon inclusion and after 6 months at the end of follow-up. EMA questions were presented longitudinally throughout the study and comprise several clinical domains: (1) the assessment of suicide risk; (2) the wish to live/die; the level of social connectedness, measured in two parts: (3) social support and (4) social withdrawal; (5) sleep disturbances; and (6) the level of appetite. Notice that one of the main deterrents of EMA are fatigue effects, which consist in a decrease in the response rate as time goes by due to the repetitiveness of the questions, and this may result in withdrawal from the study22,23. To tackle this problem and obtain EMA data beyond 1–2 weeks, both the frequency and the set of asked questions changed during the study. Up to 32 EMA questions were defined. Importantly, questions on wish to live/die, instead of direct questions on SI, were chosen to minimize the potentially adverse effects of repeated prompts over a long period. The wish to live/die captures passive ideas of suicide, the main outcome of the Smartcrises study, and is a proxy measure of SI24,25. Indeed, passive SI is strongly correlated with active SI and other suicidal outcomes, including suicide death, according to a recent metanalytic review26.

The EMA assessment is inspired by the Salzburg Suicide Process (SSP) questionnaire, which was designed to assess dynamic changes in suicidal risk27. Most of the questions of the SSP questionnaire are extracted from validated clinical questionnaires. Hence, as shown in Table 1, our EMA assessment compiles questions relevant for a dynamic follow-up of suicidal patients along different domains. Each domain is evaluated with questions inspired from a given validated clinical questionnaire. In particular, EMA questions include items from the Suicidal Status Form (SSF)28, the Perceived Social Support Questionnaire (PSSQ)29, the Interpersonal Needs Questionnaire (INQ)30, the Insomnia Severity Index (ISI)31, and the Council of Nutrition Appetite Questionnaire (CNAQ)32. However, only one to five of the 32 questions were randomly chosen every day to be presented to the patients. The frequency of the questions was progressively reduced based on the evolution of the suicide risk after an attempt: 4–5 prompts in the first month, 3–4 in the next 2 months and finally 1–2 prompts in the last 3 months33. It must be mentioned that the EMA questions were not selected completely at random; instead, those related to sleep and suicide were given much higher probabilities of being asked, due to their importance, and a turn-over system was used in order to avoid repetitions. This approach reduces EMA fatigue, at the expense of every set of responses for the same question being sampled non-equidistantly (non-uniformly).

Table 1 EMA questions.

Data management

A total of 48,489 answers to the 32 EMA questions from the 275 patients were analyzed in this study. Those answers were appropriately transformed so that they were all expressed in the range 0 to 100; 100 indicating the worst possible mental condition of the patient, regardless of the fact that some of them were reversed worded34. This transformation was done for clarity and displaying purposes to allow visual comparison. Parenthetically, the scores (but not individual items) of the clinical questionnaires were transformed in like manner.

Besides EMA data, information from regular assessments in the Smartcrises study includes demographic and clinical data, as well as validated questionnaires. Demographic data comprised: the number of years studying since the beginning of primary school, the employment situation, the marital status, and the country of recruitment. Clinically meaningful events during follow-up (including self-harm, suicide attempts, emergency visits, or hospitalizations in psychiatry that were obtained from clinical records) were computed in a binary variable. Depressive symptoms were assessed with the clinician-reported version of the IDS35. The Columbia-suicide severity rating scale (C-SSRS)36 was used to assess recent suicidal ideas and lifetime suicidal behavior at inclusion.

In addition, other features were computed from the following questionnaires, which were all self-reported. The overall score of the following scales was obtained: the medication adherence rating scale (MARS)37, the CAGE questionnaire for alcohol addiction38, the Fagerström Test for Nicotine Dependence (FTND)39, the List of Threatening Experiences (LTE)40, and the insomnia severity index (ISI)31. The dimensions of emotional, physical, and sexual abuse or neglect were obtained from the childhood trauma questionnaire (CTQ)41. The 11th version of the Barratt Impulsiveness Scale (BIS)42 was used to measure an overall score of impulsivity, as well as the first-order factors of the scale. Further, moral and physical pain were obtained via visual analog scales (VAS)43 that were administered upon inclusion. All these instruments were administered at baseline and at the end of follow-up, except the LTE, BIS, VAS, and CTQ, which were used only at the baseline assessment.

The mean of available values for each patient, or ipsative mean44, was used to impute the missing values after accounting for reversed-scored questions. This technique is based on the fact that the items in each questionnaire (or its dimension) are related to a single psychometric construct and should be highly correlated. Unlike simply removing the incomplete data (a “complete case analysis”), this technique can help reduce potential biases45. Nonetheless, if the missing rate of any questionnaire was over 20% for a given patient, the data was discarded46. Score changes, instead of last point values, were used when follow-up questionnaires were available to extract longitudinal features. Finally, marginal diagnoses representing less than 5% of the patients and overall scores of questionnaires with different dimensions were not considered in order to reduce the number of features in the dataset. All data management and statistical analysis were performed using the MATLAB software47.

Statistical analysis

EMA variability was assessed by computing the Median Absolute Deviation (MAD) of the absolute value of the successive slopes48. Refer to Supplementary Material, and Fig. S2, for a discussion about the choice of such a metric. Notice that variability was not computed for each individual EMA question directly, since some items (like those related to sleep quality) were intentionally presented more often than others. Thus, the EMA questions were grouped into six domains, namely: suicide risk, wish to live/die, social support, social withdrawal, sleep, and appetite. They correspond to the EMA questions 1 to 6, 7 and 8, 9 to 12, 13 to 15, 16 to 25, and 26 to 32, respectively, as shown in Table 1. Each domain was defined by grouping the EMA questions from a validated clinical questionnaire (for instance, sleep with the ISI or appetite with the CNAQ), with the exception of the two questions on death wish and wish to live. In this way, EMA questions from the same domain were treated as the same question, but expressed differently, and the variability was computed for each domain, containing a larger number of observations, as opposed to analyzing the questions separately. To validate the domains, the Kendall correlation coefficients49 of the EMA questions within each domain were computed to make sure that they were positively correlated.

A Gaussian Mixture Model (GMM) was used to cluster the patients based on their multidimensional variabilities (i.e., the EMA variabilities along the six domains). This provided a comprehensive analysis of the variability of the patients, which is not limited to a single clinical aspect (e.g., SI). The GMM’s inference was performed via the Expectation–Maximization (EM) algorithm, which allows to infer missing variability domains by taking the conditional expectations of the missing values given the current parameter estimates and the observed values at every maximization (a.k.a. M) step50. The optimal number of groups was determined by the Bayesian Information Criterion (BIC), which produces accurate results even in the inimical scenario of not missing-at-random (NMAR) data51. The analysis of the clinical differences between the groups is twofold. On the one hand, phenotypic profiles were compared using the two-sample Student’s t test computing the pooled estimate of the standard deviation for the numeric variables, and the Pearson’s \({\chi }^{2}\) test of homogeneity for the categorical ones. All tests were two-tailed. P values were adjusted for multiple comparisons applying the Holm–Bonferroni method52. On the other hand, a forward selection was performed to find the optimal set of clinical features that can be used by a random forest to predict the variability group each patient belongs to.

A random forest is a machine learning algorithm that uses bagging (i.e., a two-step process that stands for “bootstrap aggregating”). In the first or “bootstrap” step, several decision trees53,54, a.k.a. weak learners, are trained over different bootstrap samples of the dataset, each one omitting roughly 36.8% of the patients. Omitted patients in each tree are referred to as “out of bag”. In the second or “aggregating” step, the final classification is made taking a democratic (non-weighted) average vote amongst all the trees, thus reducing the risk of overfitting. To further prevent overfitting, diversity (decorrelation among the constituents of the random forest) was enforced by randomly selecting the clinical features that could be used for each decision split55. However, some features have significantly more missing data than others (e.g., those measuring follow-up change) and this could bias the forward selection procedure. Hence, surrogate splits were used56. In this way, if the optimal feature is missing when classifying the patients, the best surrogate feature will be used to take a decision split and to keep the movement from the root to the leaves of the trees, eventually classifying the patients without discarding partial observations. Since there are both numeric and categorical features, the selected one for each split minimized the P value of the Pearson’s \({\chi }^{2}\) tests of independence between each proposed clinical feature and the variability groups. Unlike the standard Classification and Regression Trees (CART) algorithm, this procedure is not biased toward those features that have many levels, not underestimating the importance of the categorical ones57.

The hyperparameters of the random forest were adjusted by employing Bayesian optimization58 upon each candidate feature set analyzed during the forward selection. The performance was assessed computing the Area Under the Curve (AUC) of the Receiver Operating Characteristic (ROC) curve59 of the predictions of the out of bag patients. Notice that when making predictions for the out of bag patients, only the subset of decision trees that have not been trained using a particular set of patients is used, yielding an unbiased estimator of the true random forest performance that approaches that of leave-one-out cross-validation, but which greatly reduces the computational burden60. Details of the statistical methods are provided in the Supplementary Material.

Results

Sample description

The study sample of 275 patients had a mean age of 40 years with a Standard Deviation (SD) of 14 years, approximately. It was mainly composed by females (n = 185, 67.27%), where “n” is the number of patients. Almost half of the patients were single (n = 123), and roughly one-third (n = 98) were married or had been cohabiting for more than 6 months. Study participants reported intermediate to high educational levels, with a mean number of years studying since the beginning of primary school of 13 with a SD of 3, approximately. There was a balanced proportion of around one-third of employed and unemployed patients, and patients with an incapacity to work. The main psychiatric diagnoses were post-traumatic stress disorder (n = 74, 26.91%), major depressive disorder (n = 40, 14.55%), and binge eating disorder (n = 23, 8.36%). The base rates of clinically meaningful events during follow-up in the total sample (n = 374) are 6.15% for SI, 1.60% for self-harm, 10.43% for suicide attempts, and 5.08% for emergency hospitalizations.

The participants answered the EMA questions for a mean time period of approximately 130 days with a SD of 104 days. The average length of follow-up was 148 days (SD = 116) in low-variability patients and 98 days (SD = 66) in high-variability patients. The participants answered a mean number of 176 EMA questions with a SD of 161. The mean number of answered questions was 190 (SD = 183) in low-variability patients and 150 (SD = 104) in high-variability patients. The differences between the groups were statistically significant in both cases: t273 = 3.91, P < 0.001 and t273 = 2.00, P = 0.046, respectively. The EMA response percentage was 43.25% for all the patients (maximum: 72.64%, minimum: 16.97%), 45.48% (maximum: 70.53%, minimum: 20.48%) for the low-variability group, and 39.11% (maximum: 83.07%, minimum: 7.64%), for the high-variability group. A detailed description of response rates through follow-up can be found in Fig. S1.

Validation of the domains

Figure 2 succinctly shows the correlations among the EMA questions after rescaling them from 0 to 100, accounting for reversed worded questions. There is high correlation amongst them, even if they do not belong to the same domain. The only exception corresponds to the items of the CNAQ assessing appetite that correlate strongly between them but not with items in other domains. Only positive correlations are statistically significant. All questions within the same domain are significantly correlated, with the exception of EMA3, “I am agitated (restlessness)”, which does not correlate with neither EMA4, “I am full of hope”, nor EMA7, “My wish to live is”. Therefore, EMA3 is removed from the “Suicide risk” domain. EMA3 does not correlate with many other questions, but the more independent one is EMA28, “During the last days I feel hungry”, since questions about appetite have the weakest inter-domain correlations.

Figure 2
figure 2

Correlation coefficients among the EMA questions. Correlation of the EMA questions after rescaling them from 0 to 100, where 100 represent worse health condition. Lower triangular portion: Kendall correlation coefficients; “–” for negative values. The main diagonal is intentionally empty for clarity. Upper triangular portion: P values of the hypothesis that the corresponding Kendall correlation value is 0; “*” for P values strictly lower than 0.05. The colormap on the right is used for both Kendall correlation coefficients and P values, and ranges from the overall minimum and maximum values of those quantities.

Cluster analysis

The BIC determined two variability groups. The first one, or low-variability group, has lower mean variability values and comprises 65% of the sample according to the GMM. The second group, or high-variability group, comprises the rest of the sample. The rest of the parameters are depicted in Fig. 3. Patients from both groups attain the highest mean variability values in the “Social withdrawal” domain. The highest increase in mean variability values when comparing the high- with the low-variability group are found in the “Sleep” and “Suicide risk” domains, with a 4.53- and 2.31-fold increase, respectively. With respect to the covariance in variability (not in mean values), all values are positive in the low-variability group, while there are negative variability interactions in the high-variability group (i.e., variability increases in one domain when the variability of other domain decreases and vice versa) as it happens with the “Sleep” and the rest of the domains. The analysis of the main diagonal elements of Fig. 3a,b reveals that the low-variability group is more homogenous, since it has lower variance in variability for all the domains, especially in “Suicide risk” and “Sleep”.

Figure 3
figure 3

The variability covariances and means of the low-variability group are shown in insets (a,c), respectively. The variability covariances and means of the high-variability group are shown in insets (b,d), respectively. Recall that they measure variability, not absolute values. For simplicity, only the lower triangular and main diagonal region of the variability covariance matrices is shown. Negative covariances indicate that increased variability in one domain is associated with lower variability in the other. The colormaps are the same across the groups to allow visual comparison, and they range from the overall maximum and minimum values of those qualities. Colormaps are shown at the top of each inset.

Cluster comparison

Table 2 shows the phenotypic profiles of the two variability groups, where no statistical difference is found after correcting for multiple comparisons. Figure 4 illustrates the ROC curve of the automatic classification of the high-variability from the low-variability group, using a random forest. The AUC is 0.74, with 95% Confidence Interval (CI) [0.68, 0.78], which is considered acceptable61. The hyperparameters obtained by the Bayesian optimization are listed in Table S2 in the Supplementary Material, and the importance of the ten clinical features chosen by the forward selection to make the classification with the random forest are shown in Fig. 5. According to this selection, the most relevant features to separate the groups (i.e., the features that are most useful for the random forest classification) are the depressive symptoms upon inclusion (IDS), followed by the impulsivity factor of cognitive instability (BIS), the marital status, and the frequency of SI at inclusion (C-SSRS), as well as the change in SI frequency, intensity and control from baseline to follow-up and other clinical factors such as nicotine dependence, binge eating disorders, or the occurrence of clinically meaningful events during follow-up. Single marital status and clinically meaningful events during follow-up were more common in the high-variability group, and all the other features showed also higher scores or higher prevalence in that group (Table 2).

Table 2 Demographics and clinical features by cluster according to variability level.
Figure 4
figure 4

ROC curve. The AUC is 0.74, with 95% CI [0.68, 0.78], estimated by taking 2000 bootstrap samples. Dashed blue line: random guessing reference. Solid red line: mean ROC curve of the automatic prediction of the high-variability group using the selected random forest and clinical features. Green area: AUC. Dashed red lines: 95% confidence intervals of the ROC curve.

Figure 5
figure 5

Importance of the ten clinical and demographic features used by the random forest to automatically discriminate patients in the low- and high-variability groups. For clarity, the y-axis is sorted in increasing importance order according to the random forest. The importance is computed by summing all changes in the impurity of the nodes from the parent to the two children thanks to a given clinical feature and its corresponding surrogate splits. Impurity is a measure of how the decisions of a node can separate patients in the low- and high-variability groups and it is measured by the Gini’s diversity index. The sum of impurity changes is normalized by the number of branch nodes.

Discussion

EMA and, by extension, EMI are very promising methods that could transform the field of suicide prevention. In this paper, we have analyzed a large EMA sample of suicidal patients to ascertain if symptom variability was associated with a higher risk of suicidal behavior and/or with other features of clinical severity. Symptom variability defined two clear-cut groups. The high-variability group is characterized by frequent changes not only in SI as previously reported, but also in domains such as social withdrawal, social support, sleep, or appetite. Furthermore, although conventional statistics found few differences between the groups when comparing cross-sectional data (Table 2), machine learning methods allowed us to obtain a good classification performance in this complex dataset with a large number of explanatory variables. Building on prior studies3,9, clinical variables such as the severity of depressive symptoms or the frequency of SI separate patients with high and low symptom variability. Importantly, although the occurrence of clinically meaningful events during follow-up, such as suicide attempts or hospitalizations, was not different between the groups according to standard statistics (p = 0.10), those events were almost 10 percentual points more frequent in the high-variability group and the variable was selected by random forest amongst the best features to differentiate high-variability patients.

Extant EMA studies with suicidal patients have shown so far that: (1) SI fluctuates widely, and (2) some clinical features, such as negative affect and disturbed sleep, could be used to predict SI fluctuations in the short-term5. However, fluctuations are not restricted to SI and sleep. In our study, social withdrawal was the clinical factor that showed the largest variability in both groups, a finding that could be related to the social sensitivity of suicidal patients. Among low-variability patients, social withdrawal was the only dimension with substantial variability, suggesting that in that group mood variations are mainly externalized through social interactions. Suicidal patients are sensitive to social cues and tend to interpret them negatively62, which may lead the patients to minimize their social interactions. An example of this sensitivity can be found in a recent study in which EMA-measured psychological pain correlated with orbitofrontal activation during a social exclusion paradigm in suicide attempters, but not in affective controls18. In the same vein, a small EMA study found that SI variability correlated strongly with SI intensity, but also with the intensity and variability of depressive symptoms and changes in social connectedness63. The pattern of variability extends thus well beyond SI and the instability of highly variable patients affects their social interactions, sleep and appetite, which is consistent with the fact that almost all EMA items correlated significantly with each other across clinical dimensions (Fig. 2). Interestingly, when high-variability patients fluctuate in one clinical dimension (such as suicide risk or sleep), they can be more stable in other dimensions. This is reflected by the negative correlations in Fig. 3 and suggests that some changes occur sequentially, rather than simultaneously, in the high-variability group. In contrast, fluctuations in the low-variability group tended to occur at the same time across the six clinical dimensions.

Some of the features that were selected by the machine learning algorithm to separate high- and low-variability groups have been related to impulsivity. They include one first-order factor of the BIS (i.e., cognitive instability) which reflects intruding and racing thoughts42, but also nicotine dependence64, the diagnosis of binge eating disorder65, and being single66. Since sleep disturbances predict the onset of SI67, it is interesting to note that the construct of cognitive instability has been recently identified as a transdiagnostic symptom associated with insomnia severity68. The role of impulsivity may also be related to affective instability since they are closely related, and partially overlapping, constructs69. A recent study associated SI variability with affective instability in Borderline Personality Disorder (BPD)11. We could not verify if BPD was overrepresented in the high-variability cluster because the disorder is not included in the MINI diagnostic assessment, but since childhood trauma and gender did not separate the clusters, the possibility of BPD diagnoses being concentrated in one of them is unlikely. We also lacked data on Attention Deficit and Hyperactivity Disorder (ADHD), another diagnosis that is frequently associated with cognitive impulsivity and emotional dysregulation. Impulsive reactions may also explain why high-variability participants stopped their follow-up earlier, having answered fewer EMA questions, than low-variability ones. Importantly, the relationship between SI and impulsivity is accentuated during suicidal crises70 but impulsivity can be the target of both pharmacological and psychotherapeutic interventions.

This study is based on a large sample of suicidal patients that were followed for several months using EMA methods. It is partially limited by missing follow-up data. According to a recent systematic review, the compliance in our sample was in the low range of EMA studies with suicidal patients (44 to 90%), but this could be expected given the long follow-up (prior studies ranging from 4 to 60 days) and the decline of compliance rates over time71. The use of a turnover pool of questions and a decreasing number of prompts through the follow-up seems to have reduced fatigue effects since most participants were still responding EMA questions after 4 months. Traditional methods are biased toward features that have fewer missing values. The methods applied in this study, like the EM algorithm used for the inference of the GMM, or the surrogate splits of the random forest were implemented to mitigate such a bias by exploiting the inherent structure of the data. This potential bias has also been tackled by following an agnostic approach for the feature selection, in which the algorithm was free to choose the set of features used for the classification.

A second limitation concerns the complexity of the database, which included a large number of variables and time-lagged information. We used a potent classifier based on random forest methods since off-the-shelf methods failed to provide meaningful results in this high-dimensional dataset with NMAR and correlated data. For example, in EMA, high variability patients respond more at the beginning of the follow-up, but drop out earlier than low-variability ones (Fig. S1). The surrogate splits of the random forest leveraged the natural correlations of the features to deal with the missing values, making those correlations beneficial, and the random feature selection and boosting procedure aimed to obtain robust results and to prevent overfitting72. Further, the random forest approach provides, as a byproduct, an objective tool to automatically identify patients with high variability only using demographic and clinical information, providing a probability estimate of the group each patient belongs to. The ability of the classifier to distinguish between groups was good, attaining an AUC of 0.74. Finally, two points should be noted regarding the clinical severity of the sample. First, some patients were recruited after being discharged from the hospital in outpatient consultations, which could select less severely suicidal patients. Second, although the patients excluded from the analyses were fairly similar to those included, higher basal depressive symptomatology and psychological pain suggest that they could represent more severe cases.

In summary, approximately one third of suicidal patients present high SI variability and a general pattern of instability in several domains during their follow-up, which is generally shorter. This pattern can be easily detected in early stages by assessing the severity of SI and depression, as well as impulsivity traits and other factors, and it might be associated with a higher risk of clinical events during follow-up. EMA protocols should be adapted to optimize suicidal risk assessment and preventive interventions in high- and low-variability groups.