Using Machine Learning in Burnout Prediction: A Survey

Accurate prediction provides a number of important benefits for research and decision-making. Occupational burnout is intertwined with individual, cultural, and social factors, the resolution of which requires methods that can deal with large amounts of data. The application of such methods capable of dealing with large datasets is a relatively novel research area in social science. For this purpose, this article presents insights into machine learning methods, mainly related to prediction tasks. A brief review of these techniques in burnout domain was applied. It is shown that the choice of a method depends on the presence of certain dependent variables. This paper also presents a comparison between novel and traditional approaches, which shows that the appropriateness of a technique depends on the aim of the research. The theoretical and practical implications of using machine learning methods in this context is also presented in the paper. It is found that a gap in the study of burnout exists which requires the attention of social work researchers. Through machine learning techniques, new theoretical models of burnout can be created. These algorithms can also provide new approaches to create data-driven interventions. Burnout monitoring systems supported by machine-learning algorithms can also be used in recruitment processes and to supervise employees. Applying machine learning methods in reducing burnout can also provide socio-economic benefits such as help to reduce employee turnover and improve general working conditions.

The meta-analysis of burnout shows that many factors influence the perceived level of this syndrome, from the individual level to the macroeconomic level. Each individual may have a complex combination of factors, so there is a need to deal with a large amount of data. Applying machine learning to data from various source (e.g. EEG, sensors, questionnaires) might help to identify and predict burnout in workers. The results obtained with these methods can be more accurate because they provide more techniques to examine non-linear relationships in the data than traditional statistical techniques.
Usually, burnout is assessed in white-collar (civil servants), and the helping professions such as caregivers, teachers (Kim et al., 2011;Valente et al., 2011;Vercambre et al., 2009) and social workers in child welfare practise (Lizano & Mor Barak, 2012;Mcfadden, Cambpell & Taylor, 2014;Travis & Mor Barak, 2010). Working under considerable workload pressures can lead to poor decision-making and also the tendency to take short-sighted view of issues (Keller & Ho, 1988). McGee (1989) shows that social workers with higher levels of this syndrome made earlier decisions and were more likely not to change their minds even in the face of new facts.
Burned out personnel copes with stress by denying the need to engage in particularly demanding cases. Therefore, it is important to deal with this syndrome since the behaviour of employees may have serious consequences especially for young clients.
Job burnout is a syndrome of emotional exhaustion, depersonalization, and reduced feelings of personal accomplishment. A key dimension is emotional exhaustion where people feel overwhelmed by work and are physically and emotionally fatigued. A second dimension-depersonalization-means that workers developing cynical feelings and attitudes about their clients. The last dimension is reduced personal accomplishment, where they feel dissatisfied and negatively view their work (Maslach et al., 1996). A more general definition from the International Classification of Diseases, 11th revision (ICD-11), combines burnout with 1 3 stress. The World Health Organization regards burnout as a result of chronic stress in the workplace that has not been successfully managed (https ://www.who.int/menta l_healt h/ evide nce/burn-out/en/).
Most of the previous work depicts relationships between burnout and different factors such as: job satisfaction (Stalker et al., 2007;Chen & Scannapieco, 2010); turnover and retention (Barak, Nissly & Levin, 2001;Kyonne, 2009;O'Donnell & Kirkner, 2009;Schwartz, 2008); and work environment (Perrone, 2007). Some articles focus on the prediction of burnout using statistical models, mainly logistic regression. The main aim of this paper is to present new methods of burnout prediction, based on machine learning techniques and compare them to a statistical, conventional approach. To date, only a few studies show the use of machine learning to predict burnout, with most of it focused on mood and health disorders, e.g. depression, anxiety, or suicide prevention. In this regard, this paper aims to summarize using these methods to provide new knowledge to social work researchers. Subsequent sections of this paper are organized by starting with a review of the traditional approach to predicting occupational burnout. Next, the methods of machine learning were presented and differentiated. Subsequently, a machine learning method for burnout prediction was exhibited. A conclusion of the presented information and the associated implication is presented.

Conventional, Statistical Methods
The use of data in the practice of social work has grown rapidly within the last decades. Søbjerg et al. indicate that statistical approach should also be used in making decisions about clients (Søbjerg et al., 2020). Explanation, prediction, and description are the main purposes of statistical modelling (Hanna, 1969). According to Shmueli (2011), the same method can be used to solve different problems; for example, linear regression model can be descriptive if it is used for representing the association between the dependent and independent variables, explanatory when used for causal inference, or prediction for the aim of predicting new or future observations. Descriptive analysis focus on summaries or representation about the sample but cannot explain why the phenomenon has occurred (Punch, 2005). A social work researcher may want to examine students' addictions to their phones; descriptive research might aim to describe how many hours they use them (Decarlo, 2018). In articles on burnout for example, descriptive statistics is used to show sample characteristics and burnout rates (Hamama, 2012). It can also be used to present strength and direction of correlation between dimension of burnout and other variables (Tong, Wang & Peng, 2015).
Multivariate statistical analyses (MVAs) afford the opportunity to analyse the relationship between many variables. Methods can be divided into determining a dependent variable or not (Manly, 2005;Rosenblad, 2017). Several techniques are presented below and broken down by either exploratory or predictive purposes.
Explanatory modelling is concerned with testing causal hypotheses, while the aim of predictive modelling is predicting new or future observations. Currently, statistical modelling is used mostly for causal explanation, at the expense of predictive modelling. Moreover, in many disciplines, there is the assumption that predictive power (minimizing estimation error and bias) can be inferred from explanatory power (minimizing bias). It can lead to incorrect conclusions which are confirmed by numerous studies (Shmueli, 2011).
The most popular technique is structural equation model and its types. They represent, estimate and test the relationship between latent and measured variables. These models are closely based on theory, making it possible to test theoretical constructs. To test theoretical models of burnout such as JDCS model of job stress and JD-R model of burnout, Kim & Stoner (2008) examined the effects of job autonomy, role stress, and social support in predicting burnout and turnover intention among social workers. Role stress had a positive effect on burnout but social support and job autonomy had negative effect on turnover. A type of model structural equation modelling is path analysis. For example, path analysis was preformed to test the hypothesis of conceptual model of burnout in a longitudinal study (Travis et al., 2016). The main results showed that work-family and role conflict were significant predictors of emotional exhaustion and non-significant between role ambiguity and burnout.
For prediction, a logistic regression model is used. It differs from the linear regression model in that the dependent variable is binary. This model evaluates the likelihood of a certain event. For example, Stack (2004) showed that being a social worker increased the odds of suicide by 55.6%, compared to the rest of the working age population. Sometimes this model is limited to detection of variables affecting some events-Strolin-Goltzman et al. (2008) used logistic regression to assess the effect of organizational factors on turnover intentions in public child welfare systems. The result showed that three of the five variables were statistically significant: salary and benefits, lack of other job options, and clarity of performed task. For longitudinal studies, Cox regression is used. This model belongs to the class of survival models, which determines the likelihood of occurrence of an event at a certain time by the the hazard ratios. In research on occupational burnout, it is mainly used to study the relationships between the dependent variable and independent variables at different times. For example, Mohren et al. (2003) utilized this technique to study the relationships between burnout as a risk factor for common infections. Results show the subscale Exhaustion in MBI-GSS is the strongest predictor for infections studied.
High-accuracy predictions can lead to discovering new, potential variables. Moreover, Shmueli & Koppius (2011) present how predictive models can contribute to theoretical development. The aim of statistical prediction, in the broadest sense, is limited to understanding the conditional distribution of dependent variable y given other variables x (Varian, 2014). Thus, a need arose for methods that enable the prediction of future observations.

Machine Learning Methods
Machine learning is a vast field of knowledge and there is no single definition. In general, the term is understood in its broader and narrower sense. The broad understanding of machine learning refers to the study of algorithms and systems that improve their knowledge and results along with gaining experience (Flach, 2012). A more precise definition says that a program learns through experience E, with respect to the task class T and the efficiency measure P, if its performance on tasks in T, as measured by P, appears with experience E (Mitchell, 1997).
Machine learning is a part of data mining process, that also includes methods used to discover connections and patterns in large data sets. It should be remembered that machine learning and data mining tools are derived from the methods of artificial intelligence and multidimensional statistics (Tan et al., 2005).
Most models focus on two tasks: descriptive and predictive. It's rarely used in explanatory modeling because statistical models do it better. To analyze the importance of the job-related factors in burnout, machine learning of generalized linear algorithms was performed. These algorithms were fitted to estimate the set of parameters by maximizing the log-likelihood.
Results show that individuals with high neuroticism seemed to have stronger levels of burnout when they faced stress, while those with high general self-efficacy had the lowest burnout.
The main difference is that predictive models involve the target variable while descriptive models do not. The basic distinction between machine learning methods is supervised and unsupervised learning. Supervised learning requires labeled training data, in contrast to unsupervised methods. Most descriptive models are part of unsupervised learning (for example association rule discovery, clustering) but it is also possible to use supervised learning to build these models, e.g. subgroup discovery (Flach, 2012). Prediction involves using some variables to predict unknown values of other data and supervised methods are mainly used for this purpose. Different types of target variables require different methods: classification is used for categorical variables, regression for continuous variables (Mitchell, 1997). However, there are some exceptions: for example, k-Means-Mode can deal with both types of data. Paul and Hoque (2010) have applied these unsupervised models to predict the likelihood of diseases. The probability of a disease in a cluster was defined as the number of patients with the disease divided by the total number of patients in the cluster. Results show that k-Means-Mode was better than the classical k-Means algorithm.
At this point, it is worth emphasizing what was previously mentioned-prediction of y as a function of features x, is at the heart of machine learning, while statisticians and data mining experts also focus on discovering characteristics and patterns in data (Varian, 2014). While machine learning methods can be used to refine explanatory models, they have one major drawback-interpretability. Statistical methods are simpler to understand, because they usually allow small number of variables and present measure of association, such as odds ratio (OR) from logistic regression. Machine learning methods are often difficult to interpret; this is particularly observable in neural networks and less in LASSO regression (Goodfellow et al., 2016).
The development of predictive machine learning models is primarily driven by data while conventional prediction models are by theory (Shmueli, 2011). This is an advantage for machine learning as it allows you to create new theories and use large amounts of data. The integration of a huge number of factors is possible, because machine learning copes with the problem of overfitting. This problem affects the predictor that performs well in-sample but fails out-ofsample. Machine learning provides regularization techniques for this. This set of tools also provides more methods to research nonlinear relationships between the data: random forest, support vector machines, classification and regression trees (CART), penalized regression (LASSO, LARS, and elastic nets), and so on (Varian, 2014).

Using Machine Learning in Burnout Prediction
Although machine learning has been advancing for several years, it has only recently been used for behavioral sciences (DelPozo-Banos et al., 2018). For instance, these algorithms are used in computational psychiatry to improve diagnosis of mood disorders: stress (Silva et al., 2020), depression (Webb et al., 2020) and suicidality (Kessler et al., 2015). It allows possibilities to obtain detailed knowledge about diseases (Huys et al., 2016). These methods are especially important because they can analyze data from different sources. For example, Kaczor and colleagues used machine learning techniques to detect stressful situations using digital sensors worn by emergency medicine physicians and a selfassessment questionnaire (Kaczor et al. (2020).
Sometimes machine learning methods are compared with the traditional approach. Mary & Jabasheela (2018) predicts depression, stress, and anxiety using different machine learning methods and compared results with a logistic regression model. It turned out that logistic regression had the highest accuracy. Kessler et al. (2016) identified the level of depression by survey and found that machine learning algorithms achieved better results than conventional techniques.
Machine learning was also used to identify the level of burnout. An example is the work of Bauernhofer et al. (2018), whose research sample included 103 patients clinically diagnosed for occupational burnout. The patients were tested with questionnaires such as Maslach Burnout Inventory-General Survey (MBI-GS), the Recovery-Stress-Questionnaire for Work and the Beck Depression Inventory. Cluster analyses were performed to explore the burnout subtypes and MBI-GS subscales was used as clustering variables. Three burnout subtypes were identified: the burned-out subtype, the exhausted/cynical subtype, and the exhausted subtype. Main results showed that the burned-out subtype was more depressed than the others, but no difference was identified between burned-out and exhausted/cynical subtypes with stress and sociodemographic characteristics.
The analysis of the literature showed that the use of machine learning methods to predict occupational burnout deals with a few topics, of which some examples are provided below with explanation. Lee et al. (2020) used k-means to group about one thousand nurses working in a medical center in Taiwan into two classes (burnout and non-burnout states). Next, the convolutional neural network (CNN) deep learning method was applied to predictive model to estimate 38 parameters for burn-out sample. Kurbatov et al. (2020) applied k-means unsupervised clustering (k-means analysis) and supervised clustering (k-means cluster group) to identify and predict burnout in surgical trainees. Results revealed three clusters with highrisk, intermediate-risk, and low-risk of burnout. The highrisk cluster had a higher proportion of women and single or unmarried people. This may be attributable to the lack of a work-life balance.
A type of machine learning model-multitask learning technique-was used in the work of Taylor et al. (2020). Based on data collected from surveys, sensors, and smartphones, the future mood, stress, and health of the respondents were predicted. The empirical results showed that using these methods to account for individual differences resulted in significant performance improvements. Batata et al. (2018) proposes a benchmark of a classification method. They collected data from caregivers using a self-estimation indicator and SRB burnout metric. Results showed that classification techniques from supervised learning effectively predicted the level of burnout. The analysis of significant features from Decision Tree showed that burnout was affected mostly by exhaustion and financial situation. Zhernova et al. (2020) used Maslach Burnout Inventory to predict early prerequisites of burnout. Applying machine learning approaches allowed to correctly predict burnout in 70% of cases.

Discussion, Theoretical and Practical Implications
This article invites discussion on the use of machine learning methods to predict burnout and their practical implications. This is one of the first reviews on this topic. Research from other fields of science, such as computational psychometrics and bioinformatics, show the importance of the development of these methods and how much remains to be done. This work also shows a gap in the study of occupational burnout which requires researchers of social work to fill.
Division of prediction methods by the approach and aims enabled to simply select a method for research purpose. Machine learning algorithms can lead to creating new theoretical models of burnout. The factors influencing burnout may change because burnout is related to advancing globalization and technological progress (Chabot, 2019). This means that the new methods based on artificial intelligence presented in this article, can lead to discovering newer, potential variables and combating the prevalence of burnout. This summary leads to an identification of potential directions for future research.
The main practical application of these models is to use them through evidence-based approaches to create policies and new interventions based on data. Most interventions have focused on the individual worker (Schaufeli & Enzmann, 1998). However, each individual may have complex combinations of significant factors; therefore, it is important to provide methods that deal in large amounts of data. Drawing conclusions from such database will enable decision-makers to take actions more tailored to the individual and investigate all workers in a certain unit. This method will enable a team approach and provide the appropriate resources to institutions with the largest number of employees in need. This work can indicate the need for managers and policymakers in the human services sector to design new legislative actions aimed at managing burnout based on prediction. Before any preventive actions can be taken, there is the need to know where to direct them so that the right people are reached. Supporting the decision-making process in burnout monitoring systems with machine-learning techniques allows evaluation of government programs on this topic. These methods can be used not only to intervene, but also to prevent burnout-these tools can also help with the recruitment process.
Burnout meta-analysis emphasizes the role of individual factors in predicting burnout (Alarcon et al., 2009) so it may be helpful to discover potentially burned-out new employees. It will avoid a situation where an employee starts work and then quits after a short term. These methods can also keep existing workers in the workplace by improving the work situation.
Reducing workload is also important for socio-economic reasons-it is costly for individuals, employers and societies (Bährer-Kohler, 2013). Cox et al. (2000) found that about 60% of absence from work is related to stress in the workplace. Machine learning can reduce the number of tasks by optimizing certain processes, which can significantly reduce the level of burnout. By predicting and identifying the level of this syndrome among employees, the quality of work performed and the atmosphere in the workplace can improve, which can help in coping with excessive stress in the workplace.

Conflict of interest The author declares no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.