Suicide was the primary source of death by external causes in Spain in 2020, with 3,941 cases – 7.4% higher than in 2019 (INE, 2021). 4.4% of deaths in Spain corresponded to mental or behavioural disorders (INE, 2021), while in Castilla y León (CYL), there were 228 deaths by suicide in 2020 (INE, 2021).

In CYL, the general trend in psychiatric hospitalizations was an annual statistically significant increase of 2% over 11 years (2005–2015) (Llanes-Álvarez et al., 2021). However, hospitalization in CYL tended to be lower in cases where the main diagnosis was alcohol or drug abuse/dependence (Llanes-Álvarez et al., 2020).

Additional efforts to prevent suicide may better focus on reducing the risk of suicide immediately following discharge (Williams et al., 2018). Mental disorders in themselves do not explain suicide, although it is a fact that there is an underlying mental disorder in most cases and this vulnerability interacts with many psychological and social factors that lead some individuals to either end or try to end their own lives (Haw & Hawton, 2015). For their part, mental disorders are considered major risk factors in suicide (Moitra et al., 2021), with one study suggesting that ongoing efforts are required to improve access to and quality of mental health care, to prevent individuals with mental disorders from committing suicide (Too et al., 2019).

That is why there was a need to gather official information based on the minimum basic dataset (CMBD) (Melendez frigola et al., 2016) regarding admissions or acute patients associated with suicide in CYL, which entails the search for the key factors that have the most bearing on hospital readmissions of such patients with mental health problems by way of the objective to be pursued in this research. Thus, the application of techniques such as CHAID (Jojoa et al., 2021), random forest (Wang et al., 2021), logistic regression (Qasim & Algamal, 2018), and support vector machine (Jojoa-Acosta et al., 2021) proved necessary, together with a set based on common outputs for each algorithm to obtain a general overview of how the resulting system functions. Subsequently, and for the specialist in psychiatry to conduct the assessment, a table was put together indicating which variables best explained the real situation facing each hospital subject to study. Lastly, the results obtained were based on the application of machine learning, conventional statistical methods, and expert assessment provided to suicide prevention strategies, considering the features of each region.

This research presents the materials used in the following section, describing the database used (context-hospital time behavior) and the methods with their basic theoretical principles supporting the selection of machine learning techniques used. The results obtained from the analysis and corresponding comparisons are then provided. Lastly, a discussion of results in considered together with the limitations and conclusions deriving from this research.

Materials and Methods


State-of-the-Art Review (Context-Hospital Time Behavior)

CYL (an autonomous region comprising nine provinces (Avila, Burgos, Leon, Palencia, Salamanca, Segovia, Soria, Valladolid and Zamora). It covers an area of 94,226 km2 with 2,409,165 inhabitants as of 2020 (INE, 2021).

The distribution of hospitals and their corresponding hospital records of acute patients with suicide-related mental disorders according to the province are shown in Fig. 1.

Fig. 1
figure 1

Map of CYL with total numbers of records with suicide-related diagnoses between 2005 and 2015

CYL health organization is based on territorial demarcations (Fig. 1). Figure 1 shows the total number of records of patients with mental disorders who were hospitalized between 2005 and 2015 in CYL.

In Table 1, we can observe the average population corresponding to each hospital region that recorded acute patients with mental health issues.

Table 1 Population average over the 11 years (2005–2015): variance, according to region and hospital in CYL

It is important to highlight the fact that the 261 records about the Rio Hortega hospital in Valladolid were not included, because the gathering of information from this the hospital required for the present study first started to be recorded in 2009, in contrast with the other records that date back to 2005.

The remaining hospitals coincide in the data collection period from 2005 to 2015, and so their 4054 records were included in their entirety for this research.

Furthermore, CYL is a large region in Spain and has a food and agriculture sector with a turnover of around 10% of the rest of Spain, which attention should be drawn to its meat, dairy, and animal foodstuff industry (Invest in Spain, n.d.). Twelve percent of total Spanish energy is produced in CYL, which also boosts energy diversification and innovation in terms of renewable energies (Invest in Spain, n.d.).

Dataset Description

Patient admission records in (CYL) comprise 4315 records with diagnoses of acute mental disorders in public health hospitals in Castilla y Leon (SACYL) (Sacyl, 2021) between 2005 and 2015. The data is based on the minimum basic dataset (CMBD) (Melendez frigola et al., 2016) and the International Classification of Diseases 10 (Spain, 2021).

We applied data cleaning to obtain records of patients with diagnoses associated with suicide, which we show in detail in Fig. 2. The data relating to suicide diagnoses in CYL, which we will refer to as (DBSUICIDECYL), comprises 4315 records of admissions of patients with suicide-related diagnoses. The inclusion criteria for records were acute mental health patients according to ICD-10 coding of the diagnoses selected by the authors as suicide-related disorders shown in Table 2 and Fig. 2. Lastly, N = 4054 was used to extract the 261 records from the Rio Ortega University Hospital, according to Fig. 2.

Fig. 2
figure 2

Flow of inclusion and exclusion criteria: patient data associated with suicide-related diagnoses

Table 2 Variable description of DBSUICIDECYL

Variable Description

Some dichotomous variables are created that allow us to identify the most frequent diagnoses associated with suicide in DBSUICIDECYL, these being organized into three main themes. Other variables such as years are shown numerically, with age and stay days being categorized accordingly together with hospitals. For further details, see Table 2.

Table 3 shows the distribution of mental disorders, suicidal features, and somatic disorders associated with suicide as described by DBSUICIDECYL, according to that shown in Table 2. Distribution according to gender enables us to show the distribution and corresponding percentage in each group of variables included in this study (Table 3).

Table 3 Distribution and percentages of variables according to year and gender, broken down according to mental disorders, suicidal features, and somatic disorders


The decision was made in the present research to use two state-of-the-art components in selecting attributes/variables. The first corresponds to a classic statistical technique based on goodness of fit assessed by Chi2 distribution, while the second involves the use of machine learning techniques whose function is based on 3 different approaches: entropy, probability, and the linear ratio of the variable. Based on this, the CHAID algorithms were selected (Jojoa et al., 2021) for the first component, random forest (Wang et al., 2021), logistic regression (Qasim & Algamal, 2018), and support vector machine (Jojoa-Acosta et al., 2021) for the second.

In the end, a study was carried out based on common outputs and assessment by an expert, who finally decided which would have the greatest bearing on the hospital readmissions variable from among the resulting set of variables. Details of the methods applied in the present study are provided below in Fig. 3.

Fig. 3
figure 3

Methods applied

CHAID Analysis for Feature Selection

The application of different techniques was required for the present study, to compare them and thus obtain target results about those variables that have the greatest bearing on predicting admissions in acute patients with suicide-related mental disorders. To this end, it was important to use a technique based on the statistical study involving the distribution of the data analyzed. Chi-square tests offer the chance to analyze by observing the goodness of fit of one set of data in contrast to the other—in other words, using this method known as chi-square interaction automatic detector (CHAID), it is possible to build a tree that may help to determine how the variables merge to explain the result in the given dependent datum. Nominal, ordinal, and continuous data may be used in the CHAID analysis, in which continuous predictors are divided into categories with approximately the same number of observations. In our case, the response variable evidence dichotomous behavior, enabling the CHAID to detect all the possible cross-tabulations for each categorical predictor until the best result is obtained—exactly where no other division can be made in some branch of the tree. The decision or classification tree starts with the identification of the target variable or dependent variable, which would be considered the root. The CHAID analysis divides the target into two or more categories using statistical algorithms in child nodes. Unlike the regression analysis, the CHAID technique does not require data to be distributed normally.

Machine Learning for Feature Selection

We find the use of classification algorithms in many state-of-the-art works which, via information analysis, can identify the most important attributes or variables in a prediction task. That is why we decided to apply different methods based on different linear and-linear metrics and, with the results obtained as a whole, thus determine the importance of attributes when predicting admissions. For this reason, we selected three algorithms with different metrics, as their objectivity was required.

Correlation Analysis

In a machine learning analysis, it is desirable for the variables being analyzed not to evidence any correlation with each other, as dimensionality reduction is needed to prevent any phenomena that may affect performance, such as those regarding fit. Therefore, a Pearson correlation coefficient analysis was initially carried out to observe those variables which could be disregarded according to a team of experts.

Pearson Correlation Coefficient and Spearman Correlation Coefficient

Two matrixes were created to observe the correlation between input variables: one based on the Pearson correlation coefficient (Wan et al., 2021) and one on the Spearman correlation coefficient (Ghosh et al., 2021).

Initially, we used normalized covariance to thus compare data behavior in the sets being studied, using centred statical moment. The formula corresponding to the Pearson correlation coefficient is shown in Eq. (1)


Seeking a more objective perspective, it was also decided to use the Spearman correlation coefficient [17] in such a way as to ascertain correlation behavior between variables, via the two approaches mentioned:


whereas the Pearson correlation coefficient seeks linear correlation between two random variables, the Spearman correlation coefficient targets the monotonous relationship between variables, i.e., they change simultaneously in terms of increase or decrease.

Non-correlated Variable Selection

Once the aforementioned procedures have been completed, those variables with a statistically high correlation value are then selected, i.e., those whose correlation coefficients exceed certainly given thresholds. To this end, thresholds of 0.8, 0.7, and 0.6 were established to create non-correlated subunits to select attributes. A block diagram is shown in Fig. 4 with the procedure referred to.

Fig. 4
figure 4

Block diagram showing the creation of subunits based on Pearson and Spearman correlation coefficients

Recursive Feature Selection Based on Machine Learning

Currently, machine learning techniques (ML) are being widely used for tasks involving attribute selection (Munasinghe & Karunanayake, 2021) based on their main features in terms of predicting a variable response selected, among other applications. Different machine learning techniques were used in this work to find the attributes that most affected admission response behavior in hospitals in the autonomous region of Castilla y León. The analysis was conducted for all data, and each hospital on an individual basis and the algorithms used were:

  • Support vector machine (Casalicchio et al., 2018).

  • Random forest (Kirasich et al., 2018).

  • Logistic regression (Guo et al., 2021).

Random Forest as an Attribute Selector Algorithm

This involves a set of decision trees using what is known as the bagging technique to increase generalization capacity and reduce variance in the performance metrics required. They constitute one of the most used algorithms in the industry and are widely applied in determining the importance of attributes. The functioning of this algorithm is based mainly on entropy calculation in Eq. (1) of the data used for each tree and hence used to determine those variables that provide the most information in terms of the classification task.

$$\mathrm{Entropy}= -{P}_{i}{log}_{2}({P}_{i})$$

Thus, bagging proposes an algorithmic goal to integrate machine learning algorithms to improve the general performance metrics of the system being used. A block diagram showing the model used is provided below.

Each \({DT}_{n}\) block corresponds to a decision tree trained using an independent part of the data and is assembled in the last bagging block in inference time to provide a suitably agreed output, as shown in Fig. 5.

Fig. 5
figure 5

Bagging block or ML

Multinomial Logistic Regression as an Attribute Selector Algorithm

This is known as a regression technique used to predict a categorical variable. For the purposes of the present work, an attempt was made to predict a patient’s admission to hospital, and thus determine which attributes are the ones that mainly interact to predict them.

Its functioning is based on data analysis according to multinomial distribution, as shown below:

$$f\left(x\right)=\frac{n!}{x1!\dots\dots xk!}\;p^{x1}\dots..p^{xk}$$

From this can be obtained a logarithm of the odds ratio or logit, as shown below:


Which represents the attribute incidence of arrangement \({X}_{i}\) in response variable \({Y}_{i}\) making use of the Softmax function shown in Eq. (3), in this polychotomous case. In this specific case, \({Y}_{i}\) corresponds to the dependent or Re_entry and the sets \({X}_{i}\) as described in the “Materials” section.

$$\mathrm{Softmax}(x)= \frac{{e}^{xi}}{{\sum }{e}^{xj}}$$

Support Vector Machine as an Attribute Selector Algorithm

This constitutes one of the most used algorithms in classification problems and categorical regression. Its simplicity and computational efficiency make it an ideal algorithm for these types of rapid-use and highly reliable applications. Its functioning is based on margin maximization (distance between support vectors of the data being used) to trace a hyperplane that represents the algorithm training stage. In terms of inference, the relative position of the individual vectors is compared to the hyperplane, to thus define the extent to which they belong to the class sets. It may be that model weights will be accessed once the algorithm has been trained.

$$H={W}^{T}\times X+B$$

\({W}^{T}\) is a vector arrangement whose direction focuses on the solution being sought. Therefore, the importance of the feature can be determined by comparing the size of these coefficients to each other. As such, by observing the SVM coefficients, it is possible to identify the main features used in classification and disregard those deemed unimportant (which are subject to greater variance).

Reducing the number of attributes in machine learning plays a very major role, especially when large datasets are being worked on. Indeed, this can speed up training, prevent overfitting and, ultimately, lead to better classification results thanks to the noise reduction in data.

It is important to highlight the fact that attribute selection is undertaken with the intersection of the variables obtained, i.e., with the common results in the output of the attribute selection algorithms used. It should also be stressed that the implicit order was not taken into account in the case of the algorithm that evidenced the best performance.

Attribute Selection with Clinical Meaning by a Clinical Expert

Lastly, the results obtained were shown to the psychiatrist to enable them, through their own knowledge and experience, to determine the validity of the results provided (Bennasar et al., 2015). Thus, the aim was to eliminate any bias caused by the machine learning algorithms and by the data used in the present study.


In this study, we aimed to analyze those variables that influence admissions in CYL hospitals. The Re_entry variable assumed value 1 when the patient was readmitted to CYL hospitals during the period from 2005 to 2015.

CHAID Analysis

According to Table 4, in which we showed the classification trees created using CHAID according to hospitals to determine those variables that influence CYL admissions, the reality facing each hospital differs in each region, which is why we shall explain the results found in alphabetical order:

  • Avila: The variable explaining admissions (Re_entry) is the suicidal ideas variable with chi-square = 22,831; p = 0.00 and df = 1, accounting for 36% of node 1 with 46.6% of patients being admitted.

  • Burgos: Admissions in Burgos according to CHAID are explained by years, which are distributed over years before 2008, which is node 1 accounts for 32%; then, admissions in the years 2008 to 2011 are explained by suicidal ideas with p = 0.03, chi-square = 4,719, and df = 1 accounting for 51%, of which 63% of admissions are explained by node 7. Additionally, years after 2011 are accounted for by personality disorder, accounting for 38% of admissions according to node 9.

  • In the province of Leon, we have data at our disposal from the Leon Welfare Complex (Leon) and El Bierzo Hospital. The CHAID analysis on LeonCA indicate that depressive syndrome is the disorder that most influences admissions, with 28%, p = 0.001, chi-square = 12,075, and df = 1. In the case of Hospital El Bierzo, gender proves to be the main variable that has an influence, with p = 0.00, chi-square = 13,329, and df = 1, after which are women in node 2, with personality disorder accounting for 44% of admissions with p = 0.018, chi-square = 5,594, and df = 1 which, according to node 4, accounts for 55% of readmissions.

  • In Palencia, a personality disorder is a variable that most influences readmissions according to CHAID as shown in Table 4, accounting for 48% with p = 0.00, chi-square = 28,161, and df = 1 from node 0, and then, 71% of the same cases are accounted for in node 2.

  • In Salamanca, the variable with the most influence on readmissions is gender, with p = 0.006, chi-square = 7,505, and df = 1 according to CHAID, accounting for 26% in node 0, and then in node 1 men suffering from alcohol abuse would be the next most influential variable with p = 0.02, chi-square = 5.378, and df = 1.

  • In Segovia, the most influential variable accounting for readmissions is personality disorder, with p = 0.00, chi-square = 34.128, and df = 1, accounting for 42% of them according to node 0 from the CHAID analysis.

  • In Soria, the most influential variable accounting for readmissions is adjustment disorder, with p = 0.005, chi-square = 7.958, and df = 1, which accounts for 34% according to node 0 from the CHAID analysis.

  • In Valladolid, no results were obtained from the CHAID analysis as we only included data from the Valladolid University Hospital, and this failed to produce any result.

  • In Zamora, the variable that most influences admissions are alcohol abuse, with p = 0.00, chi-square = 18.038, and df = 1, accounting for 31% of readmissions.

Table 4 CHAID hospitals

According to Table 4, in CYL, the most influential variable explaining readmissions when combining all hospitals from Figures 1 and 2 was personality disorder, with p = 0.00, chi-square = 118,715, and df=1 accounting for 36% of admissions according to node 0 from the CHAID analysis. Additionally, gender in node 2 proves influential in admissions of men in node 6 with suicidal ideas accounting for 43%, with p = 0.0037, chi-square = 4.372, and df = 1, while for women, readmissions are accounted for by node 9, with dysthymic disorder accounting for 57% with p = 0.024, chi-square = 5.15, and df = 1 of readmissions.

In short, the CHAID analysis enables us to rapidly display the most important ratios between variables, thus allowing researchers to recognise and identify profiles—in this case, behavior between 2005 and 2015 at CYL hospitals and how each behaves in terms of readmissions of acute patients with suicide-related mental disorders.

ML for Feature Selection

Correlation Analytics

We shall start this stage of the study by providing the results obtained from the correlation analysis conducted with all the variables using the Pearson and Spearman correlation coefficients in the complementary material of all hospitals included in this study. All the variables explained in Table 2 are compared according to that taken into consideration in the methodology in Figure 3. According to the review by the team of experts, correlation in each subunit of hospital data shows us that the variables do not evidence any correlation with each other, with absolute values above 0.6; i.e., no variable was eliminated from Table 2 owing to correlation.

According to Table 5, we can display those variables that prove most influential in readmissions to CYL hospitals. The analysis was conducted on all data as a whole (CYL), and for each hospital on an individual basis. Different machine learning techniques were applied in this work to find the attributes that most affect behavior regarding response to readmissions associated with mental disorders of acute patients. We can see how each mental disorder variable varies according to region, and how methods or variables that make up suicidal features vary. Generally speaking, the most influential variables for the purpose of this study according to ML in CYL are adjustment disorder, alcohol abuse, depressive syndrome, personality disorder, and dysthymic disorder in terms of mental disorders; regarding methods, the most influential variables would be benzodiazepine poisoning, suicidal ideas, drug poisoning, antipsychotic poisoning, and suicide and/or self-harm from jumping. Lastly, we can see how age and gender influence hospital readmissions.

Table 5 ML hospitals

Comparison Between CHAID and ML

Table 6 present results of metrics for ML use Splitting 80% Training 20% Testing, organized as follows:

  1. 1-

    A classification model based on 2-layer artificial neuron networks was used, with the following hyperparameters shown in Table 6 that were obtained from grid search for the ML model corresponding to readmissions at all CYL hospitals shown in Tables 5 and 7.

  2. 2-

    A classification model based on kernel methods was used, namely RBF (radial basis functions), with an SVM (support vector machine) activation function as shown in Tables 5 and 7.

Table 6 ML metrics for CYL from Tables 5 and 7

Attribute Selection by a Clinical Expert

Assessment or validation of the results shown in Tables 5 and 7 was undertaken by the clinical expert. The expert only pointed out that the variables associated with somatic disorders were not taken into consideration because they are consequences of behavior rather than a cause, whereby they deemed the results associated with mental disorders suitable, taking into account their experience in the field.

Table 7 Comparison of results between CHAID and ML according to hospitals

Table 7 shows the results comparing CHAID with ML for each hospital in this region, enabling us to display any coincidences or differences among the variables selected that explain hospital admissions. It is interesting to note that in the case of ML, several variables emerge that are associated with mental disorders and suicidal features. Similarly, the following variables coincide in the case of CHAID and ML: personality disorder, dysthymic disorder, and suicidal ideas, as well as gender in the case of the CYL data set as a whole.

It is important to highlight the fact that these algorithms work in different ways, whereby the wide variety of results from each region in addition to the complexity itself that suicide entails, should be understood.


The chi-square analysis is a method that is widely used to identify attributes that have the greatest bearing on accounting for a response variable subject to study. In Pourmand et al. (2021), this method is explained as being used to select attributes based on the statistical distribution of data, although, in many applications, the probability distribution function is unknown and difficult to estimate (Berenfeld & Hoffmann, 2021). That is why a decision was made to use a tree-based iterative method—the CHAID analysis (Jojoa et al., 2021)—complemented by a machine learning-based attributed selection, as the idea was to improve generalisation of the results obtained for these types of clinical issues in which attribute selection is of great importance. Similarly, in (Jojoa-Acosta et al., 2021), the sole use of machine learning methods and assessment by experts is proposed, differing from our work mainly in the recursive selection of attributes and improved with conventional statistical methods, preserving selection by experts in the field to validate outputs of the algorithms used.

Feature selection is an even more challenging task than prediction metrics, which is why there are several methods to carry it out (Gupta et al., 2022). There is a wide field to investigate the use of supervised learning techniques in problems related to stress (Kaur et al., 2022; S. Sharma et al., 2021) and other human behavior disorders using emerging ML techniques (Monga et al., 2022). During COVID-19, ICTs were used more intensively to perform physical and mental monitoring using ML techniques, for example (Pandey et al., 123 C.E.; M. Sharma et al., 2020).

One of the main limitations of the model used is the impossibility of observing the trend in factors in terms of the dependent variable—in this case, hospital admissions. This is the result of the fact that random forest and logistic regression use entropy and probability metrics, respectively, which do not assume any negative values at any time. This in turn limits the conclusions that can be obtained, as those variables that have the greatest bearing on the response variable are identified, albeit not in the contribution direction, i.e., whether it is positive or negative regarding hospital admissions. Furthermore, the importance associated with the order is not taken into account all the time common results are used as the main decision-making rule in the case of the variables selected as being the most important when accounting for hospital admission response.

According to You et al. (2020), policy or strategy-driven interventions should be organized by central governments and regions, considering taking regional features into account to help lower local suicide rates more effectively.

According to Table 7, the variables selected by ML that influence admissions in CYL public hospitals associated with mental disorders are found in the following:

  • Adjustment disorder, which features in hospitals in Avila, El Bierzo, Palencia, Salamanca, Segovia, Soria, Valladolid, and Zamora. CHAID coincides with the case of the Soria hospital in terms of this disorder. In a study (Fegan & Doherty, 2019), it is suggested that there is a close association between adjustment disorder and suicidal behavior, while in a study (P. Casey et al., 2015), adjustment disorder is associated with suicide as a risk disorder, although the limitations of the study itself did not enable all psychiatric environments to be subject to generalization.

  • Alcohol abuse is a variable that is selected for hospitals in Burgos, Leon, Palencia, Segovia, Soria, and Zamora as influencing admission of acute patients associated with suicide. CHAID coincides with the case of hospitals in Salamanca and Zamora in terms of this disorder. In a study conducted in Australia, it is maintained that aggressive behavior, comorbidity with other psychiatric disorders and recent interpersonal conflicts such as break-up and family conflicts may lead to suicide in individuals who suffer from alcohol abuse (Kõlves et al., 2017). Research (Conner & Bagge, 2019) indicates that alcohol abuse increases the risk of suicidal behavior, while the use of drugs such as opioids is a close predictor of attempts at suicide (Marengo et al., 2021) in some regions.

  • Depressive syndrome is an influential variable in the cases of Avila, Leon, Palencia, Valladolid, and Zamora. CHAID coincides with the case of the Leon hospital in terms of this disorder. In this study (Revappala et al., 2021), it turned out that this was the most common disorder among mood disorders involving suicide attempts.

  • Personality disorder is an influential variable in Avila, Burgos, Leon, El Bierzo, Palencia, Salamanca, Segovia, Soria, and Valladolid. CHAID coincides with the case of the Palencia hospital in terms of this disorder. This study (Doyle et al., 2016) found a 20-fold increase in the risk of suicide among patients with personality disorder in comparison to those without any such recorded psychiatric disorder.

  • Dysthymic disorder is considered an influential variable in Avila, Burgos, El Bierzo, Soria, Valladolid, and Zamora. The recurring dysthymic disorder would appear to lead to a greater risk of suicide (Witte et al., 2009).

  • Schizophrenia appears as an influential variable in Burgos, Leon, Palencia, Salamanca, Segovia, Soria, and Valladolid. There is a significant link between schizophrenia and suicide in China (Lyu et al., 2021) and schizophrenic suicides involved a greater intention to commit suicide than those without it (Lyu & Zhang, 2021).

  • Bipolar disorder is an influential variable in Avila, Burgos, Leon, El Bierzo, Palencia, Salamanca, Segovia, Soria, and Zamora. There have been studies on bipolar disorder and suicide to help understand clinical and demographic factors (Miller & Black, 2020).

  • State of anxiety is an influential variable in Avila, El Bierzo, Palencia, Segovia, Soria, Valladolid, and Zamora. Mental health diagnoses such as anxiety and/or depression are closely associated with suicide among university students (S. M. Casey et al., 2022).

    According to Table 7, the variables associated with methods or suicidal features selected by ML that influence admissions in CYL public hospitals are found in the following:

  • Benzodiazepine poisoning is considered an influential variable in Avila, Leon, El Bierzo, Salamanca, Segovia, Valladolid, and Zamora. The high proportion of this type of intentional poisoning, which includes diagnoses of mental health disorders among young women, highlights the importance of assessing mental health and the risk of suicide in emergency services deriving from suitable monitoring, according to this research (Bushnell et al., 2021).

  • Suicidal ideas are considered an influential variable in Avila, Burgos, Leon, El Bierzo, Palencia, Salamanca, Segovia, Valladolid, and Zamora. CHAID coincides with the case of the hospitals in Avila and Burgos in terms of this variable. In this research (Chapman et al., 2015), it is pointed out that the association of suicidal ideas with subsequent suicide needs to be cautiously interpreted, owing to the great heterogeneous nature of studies and associated disorders.

  • Drug poisoning is considered an influential variable in Avila, Burgos, Leon, El Bierzo, Palencia, Salamanca, Segovia, Soria, Valladolid, and Zamora. In a cross-sectional study (Shiels et al., 2020), it was found that demographic features and geographic patterns varied according to the cause of death, which suggests that the increase in death results from this cause and alcohol abuse is not merely concentrated within a single group or region.

  • Suicide by psychotropics is considered an influential variable in Burgos, Leon, El Bierzo, Palencia, Salamanca, Segovia, Soria, and Zamora. There is a study that compares cases of self-intoxication by psychotropics (Pfeifer et al., 2020); the results of which depend on the region subject to study.

  • Antipsychotic poisoning is considered an influential variable in Avila, Burgos, Leon, El Bierzo, Salamanca, Soria, and Valladolid. In research into this subject (Ferrey et al., 2018), little difference was found in terms of the toxicity of individual mood stabilizers.

  • Suicide and/or self-harm from jumping is considered an influential variable in Burgos and Salamanca. There is little difference in terms of the features of individuals who jump from different places (Bennewith et al., 2011; Gunnell & Nowers, 1997).

  • Of the other variables considered influential according to Table 7, only Burgos showed via the CHAID analysis that the years variable accounted for its admission behavior; in the case of the other hospitals, this variable was not selected according to the methodology proposed in the study. Specifically, the economic crisis (Mattei et al., 2019) and the increase in unemployment (Chang et al., 2013; López-Contreras et al., 2019) are considered important risk factors regarding suicide (Demirci et al., 2020). In general, it is estimated that this is accentuated in situations of economic uncertainty (Vandoros et al., 2019), or when the situation regarding family poverty worsens, especially if associated with previous mental health problems (Pan et al., 2013). Consequently, and considering that a situation of world trade collapse is being announced leading to a major economic crisis as a result of the pandemic (Slater, 2020), this will foreseeably influence suicide rates, as occurs in the case of all disasters (Mannix et al., 2020).

  • Age is another variable considered influential in the case of hospitals in Avila, Leon, El Bierzo, Palencia, Salamanca, Segovia, Soria, and Valladolid, according to the results obtained in Table 5. In this study (Da Veiga & Saraiva, 2003), the practical implications within the context of previous theories that relate suicide age patterns to sociological and economic dimensions are discussed.

  • Gender is a further variable considered influential in the case of hospitals in Avila, Burgos, El Bierzo, Palencia, Salamanca, Segovia, Soria, Valladolid, and Zamora, according to the results obtained by ML in Table 7. The CHAID analysis also coincides with the El Bierzo and Salamanca hospitals in terms of this variable. Considering the differences in intention to commit suicide between men and women highlighted in this study (Freeman et al., 2017), gender-oriented prevention and intervention strategies would be recommended, whereby this variable proves to be influential in accounting for admissions of acute patients associated with suicide.


DBSUICIDECYL contains the records of acute patients with suicide-related mental disorders and represents one of the most diverse cohorts in the country. The nature of this study also limits the records of patients meeting the inclusion criteria shown in Table 2.

The data analyzed here has been anonymised (BDSUICIDECYL), and as such, there is no knowledge of the patient’s socio-economic data. However, the period from which the data was taken was from 2005 to 2015, including the 2008 period of the financial crisis. According to the studies by Roca et al. (2013), to assess the relationship between suicide and the economic crisis, we must avoid focusing on immediate suicide rates, but rather, first look at the underlying diseases and only later at the consequences of those diseases, i.e. suicide, and the use of health services. That is why we focused on mental disorders and suicidal features when carrying out this study.


The relevance of this study is to show the variables for each hospital and for the entire region with their metrics in order to show the feature selection that allow us to understand patients who are readmitted with suicidal behavior.

According to an ML analysis on CYL hospital readmissions of acute patients with suicide-related mental disorders between 2005 and 2015, we found variables that influence adjustment disorder, alcohol abuse, depressive syndrome, personality disorder, and dysthymic disorder.

Of the methods or features associated with suicide over the same period, in the ML analysis carried out, we found influential variables such as benzodiazepine poisoning, suicidal ideas, drug poisoning, antipsychotic poisoning, and suicide, and/or self-harm from jumping. Other influential variables that we found in this study were age and gender.

For its part, the CHAID analysis coincided with ML in variables influencing personality disorder, gender, suicidal ideas, and dysthymic disorder.

According to the results obtained, it is necessary to continue investigating the various factors that affect suicide; in this case, we address the hospital management of readmissions. However, there are other areas of research that can contribute to suicide prevention. All of them contribute to the knowledge of this problem, for example, we can mention initiatives that help prevent it, such as training activities for its professionals (Castillo-Sanchez et al., 2019) and mindfulness therapies in times of COVID (Castillo-Sánchez et al., 2022).

Expected future work will involve verifying suicide-related mental disorders over the years 2016 to 2020 in the same region.