Background

Various constructs and definitions of burnout have been internationally published. Maslach and Leiter [1] summarized burnout to be the reflection of a ‘breakdown in the relationship of people with their work’ [2], and hence, established a clear link between burnout and occupation. In contrast, other sources consider burnout to be a medical condition and assume a conceptual confusion of burnout and depressive disorders, and therefore, raising the question whether burnout can be considered as a distinct construct or rather a specific aspect of depressive disorders in terms of a burnout-depression overlap [3,4,5,6,7,8,9,10,11,12]. The conceptual inconsistencies regarding the definition and diagnosis of burnout are summarized by Mäkikangas and Kinnunen [13] as well as the Health Technology Assessment (HTA) Report [14]. The HTA report concluded that the great heterogeneity across studies and theoretical frameworks (e.g, determining the types, development or progression of symptoms) do not allow for a standardized, universal, and internationally accepted diagnosis of burnout. Disregarding (or possibly ending) this controversy only very recently, the World Health Organization (WHO) decided on a burnout definition, and launched an announcement stipulating the inclusion of burnout in the 11th revision of the International Classification of Diseases (ICD-11) as an occupational phenomenon and not a medical condition [15]. In this framework, burnout is characterized by three dimensions: 1) ‘feelings of energy depletion or exhaustion’, 2) ‘increased mental distance from one’s job, or feelings of negativism or cynicism related to one’s job’, and 3) ‘reduced professional efficacy’. Thus, burnout represents a factor influencing the health status that ‘refers specifically to phenomena in the occupational context’ [15]. Even if burnout itself will not be considered an illness or a health condition, it has a negative impact for various occupational professions [1, 4, 12, 16,17,18,19]. A recent systematic review significantly associated the presence of burnout with a variety of adverse physical (e.g., coronary heart disease, diabetes, prolonged fatigue, hospitalization, pain), psychological (e.g., depressive symptoms, insomnia), and occupational consequences (e.g., absenteeism, job dissatisfaction, job demands, new disability pension) [20].

Apart from the discussion about the description of burnout as an occupational stress syndrome, the burnout facet of ‘feelings of energy depletion or exhaustion’ is a very unspecific stress symptom and commonly observable in other than occupational contexts, and therefore, not limited to employed or self-employed populations. In fact, exhaustion – and fatigue as well – represents a transdiagnostic phenomenon, observable in several physical (e. g. cancer) or mental health conditions (e. g. major depression or somatoform disorders). For example, fatigue is a common disease- and treatment-related symptom among cancer patients [21, 22], and lower quality of life in association with feelings of fatigue is observable in the general population [23]. Therefore, the accurate assessment of exhaustion is not only of importance in screening for the burnout syndrome but rather several kinds of stress-related health issues.

Assessment of burnout and exhaustion

The most frequently used psychometric tool to assess exhaustion in the context of burnout is the Maslach Burnout Inventory (MBI) [24] comprising 22 items. The three main scales ‘emotional exhaustion’ (nine items), ‘depersonalization’ (five items), and ‘personal accomplishment’ (eight items) are in accordance with the current definition of burnout by the WHO [15]. ‘Emotional exhaustion’ assesses exhaustion at work (e.g. ‘I feel frustrated by my job.’), ‘depersonalization’ measures to what extent individuals are distancing themselves mentally from the own work and people at work (e.g. ‘I feel I treat some recipients as if they were impersonal objects. ’), and ‘personal accomplishment’ asks about how the participants are performing at their work (e.g. ‘I have accomplished many worthwhile things in this job.’). The MBI focuses not only on specific psychosomatic stress symptoms – the exhaustion component of burnout – but considers also other aspects like depersonalization and professional (in)efficacy which can be of consequences not only for the affected individuals, but in case of e.g. health professionals is also related to decreased patients’ safety [25]. However, the MBI was never intended as a diagnostic tool for clinical practice, while, in contrast, the Burnout Screening Scales (BOSS) by Geuenich and Hagemann [26, 27] were developed specifically with the aim to provide a screening tool for clinical practice and to assess clinically relevant symptoms of occupational stress and burnout in the individual, emphasizing additionally other stress components in different areas of life (not only the occupational situation) and psychosomatic symptoms (physical, cognitive, and emotional complaints). Stress-related mental disorders are hardly only a result from chronic stress at work alone but have multiple sources of distress regarding work and family life.

The BOSS comprises three modules: BOSS I, BOSS II, and BOSS III which can be utilized each on their own. The BOSS I asks about stress and complaints, and the BOSS III about resources, each with the four subscales ‘occupation’, ‘own person’, ‘family’, and ‘friends’, referring to the last 3 weeks. In the present study, we focused on the BOSS II which is measuring psychosomatic symptoms regarding different aspects of exhaustion. Compared to the MBI scale ‘emotional exhaustion’, the BOSS II asks specifically about different types of psychosomatic symptoms, covering ‘physical’ (e.g. sleeping problems), ‘cognitive’ (e.g. lower willingness to make decisions) and ‘emotional’ symptoms (e.g. fears about the future).

Areas of application of the BOSS II are occupational medicine, psychotherapy, psychosocial counselling, and general medical care [26]. In all of these settings, it is important to have psychometrically sound screening tools to complement clinical interviews, and assess mental health issues like stress-related exhaustion. Especially at first visits to doctors and therapists, it is helpful to get a reliable and valid overview of the patient’s situation quickly. There must be a balance between breadth and depth of the assessment. The earlier (increasing) exhaustion is recognized, the better it can be addressed in terms of prevention and treatment. As the BOSS-II assesses exhaustion more detailed than the MBI, it can be used not only to measure burnout risk, but elevated exhaustion in general, which is not a burnout-specific symptom but is present in several clinical conditions.

Study objectives

The present study aimed at the development and psychometric evaluation of a shortened version of the BOSS II to provide an economic measure for the assessment of stress-related physical, cognitive, and emotional symptoms. Based on the original BOSS II with 30 items, the main goal was a version with only 15 items without compromising the psychometric quality of the measure. Beyond that, we explored which groups of the general population report more frequently physical, cognitive, and emotional complaints, respectively.

We expected to find a shorter version of the BOSS-II with comparable psychometric properties to the original scale in terms of factor structure, internal consistency, and a similar correlational pattern. As the BOSS-II assesses physical, cognitive, and emotional symptoms of exhaustion, which are observable in a variety of physical and mental health issues, especially in depression, we expected to find similar differences in terms of gender and age as for major depression in each subscale, with female participants reporting more exhaustion than male participants, and older participants reporting more exhaustion than younger participants. Furthermore, we expected moderate positive associations between the BOSS-II and the mental health conditions of depression and anxiety as well as a moderate negative association with quality of life.

Methods

Study procedure

The present study was designed as a cross-sectional study among the general population in Germany. The data collection took place in July and August 2011. It was conducted in cooperation with the independent demography research service USUMA Berlin (Unabhängiger Service für Umfragen, Methoden und Analysen, Berlin, Germany). The aim was to obtain a representative sample of the German general population. USUMA applied a multistage sampling method based on electoral districts, households, and persons in the household. In a first step, German regional areas were predefined using the reference system for representative studies in Germany provided by the ADM-Sampling-System. In this system, the total area of Germany is divided into 258 regions. Based on these regions, 17 target households per region were selected via random route procedures, leading to 4386 contacted households, and household members were randomly selected using the Kish selection grid. Eligibility criteria were sufficient German language skills and an age of ≥14. The survey comprised two written questionnaires. The first questionnaire contained sociodemographic and household information and was conducted face to face with experienced and trained interviewers in order to control for representativeness of the sample. After that, participants answered the second part of the survey independently. In that time, the interviewer was still present and available for questions. All participants gave their informed consent before participation.

Participants

Of 4386 contacted households and target persons, there was a total response rate of 59%, leading to a total sample of N = 2555 participants. We removed all participants who had missing values on at least one of the BOSS’s items from the analysis as well as participants under the age of 18. This led to a final analysis sample size of N = 2429. Table 1 provides sociodemographic characteristics of the final sample. The representativeness of the sample in terms of respondents’ age and gender could be confirmed by comparing the distributions with data provided by the Federal Statistical Office of Germany [28].

Table 1 Sample characteristics and one-way analyses of variance (ANOVA) for gender, age, education, marital status, and income

Instruments

The Burnout Screening Scales II (BOSS II) [26, 27] consist of 30 items addressing burnout associated physical (‘I suffer from sleep disorders.’), cognitive (‘My willingness to make decisions has been lost.’), and emotional symptoms (‘I have fear of the future.’). Items are evenly distributed across domains (ten items each). The BOSS II asks respondents to what extent they suffered from any of the symptoms during the last 7 days ranging from 0 (‘does not apply’) to 5 (‘applies fully’). In the original 30-item version, internal consistency, calculated for different samples, ranged between α = .79 and α = .88 for ‘physical’, between α = .78 and α = .97 for ‘cognitive’, and between α = .81 and α = .96 for ‘emotional’ symptoms [26]. For each of these subscales, it is possible to build three types of values: total score, intensity value, and width value. In the current study, we did all calculations with the total score of each scale.

The European Quality of Life Scale (EuroQol) in its revised version 5 L was used to assess health-related quality of life [29]. It consists of five items – utilizing five-point Likert scales with various wordings – measuring the extent to which respondents experience limitations in their daily life based on health issues. By reverse-coding, a quality of life index is obtained. Based on the sample of the present study, the coefficient of ω = .88 indicated good reliability.

To assess symptoms of depression, we used the PHQ-9 [30,31,32] depression module of the Patient Health Questionnaire (PHQ) [33]. It consists of nine items scoring from 0 (‘not at all’) to 3 (‘nearly every day’). In the present sample, internal consistency was high (ω = .91).

The Generalized Anxiety Disorder Scale-7 (GAD-7) [34, 35] is a brief measure for assessing generalized anxiety disorder and severity of general anxiety symptoms. It contains seven items ranging from 0 (‘not at all’) to 3 (‘nearly every day’). In the present sample, reliability was high (ω = .90).

Relevant sociodemographic parameters were gender (male or female), age, education (≤ 9 years, 10 years, ≥ 11 years), marital status (married, committed relationship, single, separated, divorced, widowed), employment (working full-time, working part-time, unemployed, retired, in training), and monthly net income (≤ 1500 €, < 2500 €, ≥ 2500 €), assessed in accordance with the demographic standards of the Federal Statistical Office of Germany.

Data analysis

All analyses were conducted using R [36]. Applied packages were lavaan [37], psych [38], semTools [39], and stuart [40]. First, we randomly split our full sample into an exploratory (n = 1197) and a confirmatory one (n = 1232). In order to reduce the initial item pool of 30 items while retaining the three-factor structure, we used the R package stuart [40] among the exploratory subsample. Stuart uses ant-colony-optimization to construct subsets of possible items and compares them to find the optimal model solution – in terms of model fit and reliability – for a given number of items and factors. We constrained the search algorithm to look for three-factorial solutions with five items per factor, and to prefer solutions that are invariant across respondent gender.

The solution generated by stuart was tested in the confirmatory subsample, using confirmatory factor analysis (CFA) with robust maximum likelihood estimation (MLM) and robust formulas for estimating fit indices [41, 42]. To evaluate model fit, we referred to the χ2-test, interpreting χ2 as stated by Hu and Bentler [43] as well as Schermelleh-Engel et al. [44], according whom, χ2 should ideally be non-significant, and the ratio of χ2and degrees of freedom (df) should be smaller than 2 (or 3) to indicate good (or acceptable fit). However, as these statistics are biased by sample size, we relied additionally on the following indices in evaluating model fit: the Comparative Fit Index (CFI) and the Tucker-Lewis Index (TLI) which should both be greater than .95, the Root Mean Square Error of Approximation (RMSEA) and its 90% confidence interval which shouldbe smaller than .05 (or .08) to indicate good (or acceptable) fit, as well as the Standardized Root Mean Square Residual (SRMR) which should be smaller than .05 (or .10) to signify good (or acceptable) fit. We report McDonald’s ω as a measure of internal consistency [45].

To investigate group differences, we conducted one-way Analyses of Variance (ANOVA) in the sociodemographic variables gender, age group in years, education, marital status, employment, and monthly net income. To interpret effect sizes of the results, we used partial eta-squared (ηp2). ηp2 values of .01 can be interpreted as small, values of .06 as medium, and values of .14 as large [46].

Measurement invariance was tested using the customary procedure of comparing increasingly restrictive structural equation models representing increasingly strict levels of invariance: equal factor loadings for metric invariance, equal item intercepts for scalar invariance, and equal residual variances for strict invariance [47]. Differences between models of ≤ .01 in CFI and gamma hat (GH) are evidence for invariance [48]. Only when metric and scalar invariance are met, latent and observed mean scores can be compared reasonably [49]. When strict invariance holds, this further means that all differences in observed variances are caused by differences in latent variances – that is the construct under study [49, 50].

Finally, we report normative percentile values stratified by gender and age group.

Results

Development and psychometric properties of the BOSS II-short

As outlined above, we used stuart [40] to find the optimal solution for a short form of the BOSS II. Among all 16,003,008 possible models, the algorithm selected the configuration presented in Table 2. This model had good overall fit: χ2(210) = 613.625, χ2/df = 2.922; p < .001, CFI = .961, TLI = .960, RMSEA = .059 (.053; .064), SRMR = .047.

Table 2 Item descriptive statistics of the short version of the Burnout Screening Scales II (BOSS II-short)

As reported in Table 2, descriptive statistics for all items – in addition to the subscale scores – were good for the most part. The corrected item-total correlations exceeded .500 for all scales, and are thus satisfactory [51]. For 10 of the 15 items, skewness and kurtosis were within the limits of absolute skewness < 2 and absolute excessive kurtosis < 4, provided by Kim [52]. Remaining five items had deviations indicating non-normal distributions. Specifically, we found right-skewness and slight deviations from normal distribution.

To evaluate the BOSS II-short’s factor structure, we tested the fit of the model, which we constructed in the exploratory subsample, also in the confirmatory subsample. Model fit of the 15 items solution can be considered as very good. Detailed statistics of these analyses can be found in Table 3. Standardized factor loadings exceeded .500, and for all but one indicator .600. Factor inter-correlations were high, rphysical, cognitive = .751, rphysical, emotional = .687, and rcognitive, emotional = .853. These values are somewhat higher than desirable. However, they are still lower than those of the original 30-item BOSS II (rphysical, cognitive = .767, rphysical, emotional = .716, rcognitive, emotional = .872). Internal consistency was very good, albeit slightly reduced compared to the original BOSS II, ωphysical = .858, ωcognitive = .935, and ωemotional = .923. Overall, the model showed very good fit in all measures.

Table 3 Model fit indices for the Burnout Screening Scales II (BOSS II), original and short version

The results of the ANOVAs comparing different categories of gender, age group, education, marital status, employment status, and monthly net income for the three BOSS II-short subscales are depicted in Table 1. We found statistically significant results for all comparisons. However, regarding the effect sizes, the majority of comparisons show small or negligible effect sizes. There were very small gender differences for all three BOSS II-short subscales with female participants reporting more physical symptoms, and male participants more cognitive and emotional symptoms. Older participants reported more physical and cognitive symptoms. There was a large effect in the subscale physical complaints (ηp2 = .155) for age group, explaining 16% of variance, and a small effect in cognitive symptoms (ηp2 = .048). Regarding employment, for physical symptoms there was a proportion of explained variance of about 16% (ηp2 = .156), with retired and unemployed participants reporting the most symptoms. Looking more closely at different employment groups, retired participants had the highest scores in physical and cognitive symptoms, and unemployed participants reported the most emotional exhaustion. In all three BOSS II-Short subscales, part-time working participants had more symptoms of exhaustion than full-time working persons.

Measurement invariance of the BOSS II-short

Regarding measurement invariance of the BOSS II-short, the analyses revealed clear evidence for strict invariance across respondent gender with all CFI and GH comparisons revealing very small deviations (see Table 4). In contrast, there were large deviations when considering participant age: we found clear evidence for metric invariance but scalar invariance was only achieved by freeing the intercepts of Items 6 and 10 to vary between groups.

Table 4 Tests of measurement invariance of the short version of the Burnout Screening Scales-II (BOSS II-short)

Further validity aspects of the BOSS II-short and normative values

Regarding the BOSS II-short’s convergent validity, we found the expected pattern of correlations reported in Table 5. The BOSS II-short – as a measure of exhaustion – correlated positively with symptoms of depression and anxiety, and negatively with quality of life. Moving from the 30-item to the 15-item versions of the BOSS II, we naturally observed a reduction in variance explained. This decline was significant for four of the nine pairs of associations, |Δr| ≤ .042, Δz ≤ 2.57. Yet, the effect sizes were very small, making up less than 1% of overall variance.

Table 5 Correlations of the study variables

Finally, we calculated norm values for the BOSS II-Short based on our representative sample. We report percentile ranks categorized by respondent gender and age in the Supplementary Tables 1 and 2 (Additional file 1).

Discussion

The aim of the present study was the construction and psychometric evaluation of a shortened version of the Burnout Screening Scales II (BOSS II) [26, 27], a measure of physical, cognitive, and emotional symptoms, often occurring with burnout, but also in the context of health issues outside of the occupational context, making exhaustion a transdiagnostic phenomenon. For epidemiological as well as etiological research in health and clinical psychology, psychotherapy as well as psychosomatic medicine, it is vital to understand how different psychosocial phenomena are intertwined, and therefore, they need to be assessed simultaneously. However, vast numbers of instruments and items can be extremely time consuming, burdening or even exhausting for participants in such studies. Hence, researchers are keen to keep the number of items as minimal as possible while still having the highest information output. Therefore, short screening instruments are important when aiming to cover several different topics and questions simultaneously.

The final short version of the BOSS II with 15 items (BOSS II-short) showed excellent model fit for the hypothesized three-factor solution. Each subscale is comprised of five items and show good (‘physical symptoms’) or very good internal consistency (‘cognitive symptoms’, ‘emotional symptoms’). To our knowledge this study is the first to investigate the BOSS II (or a short-form of it) for measurement invariance. Specifically, we found that the measurement model is equivalent for men and women but not across the age spectrum. Invariance in the measurement of the burnout facet ‘exhaustion’ across gender was demonstrated by previous research for various burnout scales [53,54,55,56]. Therefore, it is likely that exhaustion is characterized by similar physical, cognitive, and emotional symptoms for male and female participants. In contrast, age-related invariance seems unclear for common burnout [57,58,59,60], which is also reflected in the present study: metric invariance held, but scalar invariance was only achieved after relaxing two intercept constraints. However, this is to be expected given the nature of the item content. Both of these items belong to the BOSS II-short ‘physical’ subscale and describe phenomena that have been shown to be generally more common in older populations like joint pains and high blood pressure [61, 62]. Strict invariance was only attainable by releasing constraints for seven residual variances. Thus, the interpretation of the BOSS II-short as strictly invariant across age would be highly questionable. We do, however, find strong evidence for partial scalar invariance which is a sufficient condition for meaningful group mean comparisons. This observation fits with the higher burden of disease in older persons [63].

There was almost no loss in validity when comparing BOSS II-short to the original BOSS II with explained variances greater than .90. Similarly, there were minimal reductions in the associations of the BOSS II-short and external measures (depression, anxiety, and quality of life). Four of the nine differences were significant. However, the very small effect sizes of the differences (R2 < .01) indicate that the long- and short-form of the BOSS II are related to depression, anxiety, and quality of life in similar ways. Therefore, with the BOSS II-short, one can obtain (very close to) the same information by asking only half of the questions.

We observed some differences in symptoms of exhaustion between the categories of the variables gender, age group in years, education, marital status, employment, and monthly net income. These effects were the strongest for the subscale ‘physical’, particularly for age group and employment. As we considered also retired participants who scored the highest for physical and cognitive symptoms, these values are most likely confounded with the age group or the burden of disease of these participants. In Germany, the regular age for retirement is between 63 and 65 years and those retiring earlier do so because of medical or other demanding conditions (e. g. care for family members), prohibiting them to continue working. Beyond that, differences in gender and age could be due to different adaption to the concrete work setting. For example, a longitudinal study investigating activity-based flexible offices (A-FO), that are open-space work settings with a flexible work time and work space organization, indicated, that after changing into a A-FO, employees showed worsened work engagement and increased levels of fatigue [64]. These effects differed between men and women as well as employees of different age. Therefore, subjective evaluation of the work setting and its conditions should help to further understand gender and age differences.

Most interesting is the fact that the unemployed participants reported the highest emotional exhaustion, emphasizing that the BOSS II-short is not limited to the occupational context but to more aspects in life where e.g. the absence of an occupation is a major stress event. This is in line with authors who claimed that workers’ occupational health should not be seen isolated but in context with other factors of stress or individual conditions [65]. Such a perspective could also explain that in our study, part-time working participants were physically, cognitively, and emotionally more exhausted than full-time working persons. This could be due to the fact that beyond occupational duties there are other major life stressors to coordinate such as family life, caregiving, one’s own medical care, or other issues. Beyond that, and throughout our analyses, participants with a lower level of education as well as people with a lower income feel physically, cognitively, and emotionally more exhausted. These observations are in line with research in social epidemiology, where we can find higher burden of disease in older, unemployed, poorer, and less educated people who would therefore need more public support in prevention of mental disorders or physical health conditions [66, 67].

Strengths and limitations

The major strength of our study is the thorough statistical approach allowing us to successfully shorten the BOSS II from 30 to 15 items without substantial loss of information. We based the calculations on a large sample of the German general population, making it possible to screen the level of exhaustion in the population through gender, age, employment status, marital status, income level, and educational level. While we took a more general look at physical, cognitive, and emotional symptoms of exhaustion throughout the German general population, we did not concretely assess the three criteria of burnout, and hence, cannot make conclusions about the source of the personal exhaustion. With regard to physical symptoms, age-related problems might be a relevant confounder so that results have to be interpreted carefully when looking at the different employment groups. Interpretation of results is further limited by the fact that we investigated the BOSS II-short and the original BOSS in the same sample. There is no reason to expect systematic biases in our analyses but future research should nonetheless aim to confirm the present findings by basing their findings on a sample only applying the new 15 items version of the BOSS II-short. Additionally, it is important to note that the BOSS II-short is only a self-report tool and therefore, it is rather assessing burnout risk respectively risk of clinically relevant exhaustion than the presence of burnout or a mental health condition. An elevated score in the BOSS II-short should be followed by a clinical interview conducted by healthcare personnel in order to be able to establish a diagnosis. Future studies could address this aspect, looking for consistency between self-report (BOSS II-short) and clinical interviews. Furthermore, physical exhaustion could also be assessed with objective medical tests such as slowed reflexes or short-term memory problems. Finally, to address the overlap of burnout risk and exhaustion, a direct comparison with the latest version of the MBI could help could help to understand how the BOSS II-short and the MBI capture different constructs and are applicable to different contexts.

Conclusion

The BOSS II-short comprising only 15 items has good psychometric properties and can add important insight for both epidemiological research as well as for clinical practice. It is particularly useful because of its brevity – with no information loss compared to the original 30-item version. Additionally, our analyses provided first normative values for physical, cognitive, and emotional symptoms assessed with the BOSS II-short, making it easily accessible for its application in practice.

In summary, the BOSS II-short represents a very efficient and informative assessment tool, economically applicable in large scale surveys or for initial individual assessments in clinical care. Its use in epidemiological research might help to provide a better understanding of public (mental) health.