1 Introduction

1.1 The adult population gap in bullying instruments

Bullying, or the repeated hurtful treatment of a targeted person who struggles to stop it, is a widespread and serious problem in schools [1, 2]. Despite these experiences impacting an individual across the lifespan [3,4,5,6], there are surprisingly few options available for measuring its severity in adults. The respondents of bullying-focused instruments can be peers, teachers, parents, or the experiencer themselves [7, 8]. The experiencers in focus can be children, adolescents, and adults [9]. However, most measures focus on current childhood bullying experiences. We are only aware of two retrospective instruments for adults. One [10] is extensive and ideographic, so helpful in clinical contexts, but less so in research or clinical contexts that require efficient administration or graded scales. The other measure focusses solely on victims of bullying and no other bully-roles [11]. Past-traumatic-event-focussed instruments (Traumatic Life Events Questionnaire (TLEQ) [12]; Life Events Checklist (LEC) [13]; Brief Trauma Questionnaire (BTQ) [14], either only include a binary item on past bullying (TLEQ) or have no specific mention of it (BTQ, LEC).

The types of bullying behaviour that researchers have chosen to study have also varied with respect to both bullying acts and participant roles. Bullying behaviours can be either direct (e.g. stealing, insulting, attacking, threatening) or indirect (e.g., rumour spreading, exclusion) [15, 16] as well as whether one was a victim, perpetrator, or witness to these acts. It seems common that children simultaneously hold multiple bullying roles or move between them across time [9, 17,18,19]. This may be influenced by the social environment [20, 21]. Teacher intervention in one study for example, appeared to influence a child’s adoption of specific bullying roles [22]. Jenkins and Nickerson [23] suggested understanding the extent to which a student engages in different bully-roles, rather than categorising them into distinct groups. Further, a recent review has called for better multidimensional measurement of bullying [24]. Again, existing measures that incorporate different bullying roles (victim, perpetrator, witness/bystander) are mainly focussed on children’s and adolescents’ present-day experiences [1, 25,26,27,28,29,30].

In children, experiences of bullying are associated with disrupted social relationships [31, 32], emotional disturbance [2, 33,34,35], traumatic stress responses [36], future victimisation [37] and lower general health [38]. Research suggests however, that the impact of bullying can persist into adulthood, impacting on interpersonal trust, self-esteem, and feelings of power and safety. Adult outcomes of bullying include various mental health problems, particularly if one was frequently exposed [19, 39]. This general impact is seen across the roles of the bully, victim, or bully-victim, however there is also evidence of bully-role-specific outcomes. As adults, childhood victims have reported more clinical conditions such as social anxiety [40] depression, suicidal ideation and borderline personality disorder [41,42,43,44] and adulthood psychosis [45, 46]. Victims have also been found to report more somatic problems such as headaches, difficulty sleeping, and stress [34]. There are also indications of increased social sensitivity, higher rejection sensitivity [47] and some association between victimhood and making and maintaining friends in adulthood [48]. Similarly, self-blame and shame have been considered both an outcome [49, 50] and a mediator of negative outcomes stemming from childhood bullying experiences [44].

Bullying perpetration has been seen as a risk factor for adulthood offending [51] as well as increasing aggression in adolescence [31]. Bully-victims have greater likelihood (compared to victims and perpetrators) for psychiatric problems, poor physical health, and disrupted social relationships [39]. They show lower self-control, social acceptance, self-esteem [52,53,54] and more conduct problems [55].

There is limited research about the adult outcomes for witnesses to bullying and those who inhabited more than one bullying role. However, based on research with children and adolescents, there is reason to suspect psychological disturbance in witnesses as well [38, 42]. Researchers have also found different bullying behaviours (e.g., direct vs. indirect, verbal vs. physical) to have different impacts [19]. Hence, examining various bullying behaviours across different bully roles in one instrument would allow for a fuller examination of associated adult outcomes in research. Also in clinical contexts, understanding a person’s bully-role-related trauma (e.g. a bully-victim), can provide treaters with important background information.

1.2 The Bullying and Exclusion Experiences Scale (BEES)

The BEES was developed to enable adults to retrospectively report school-age experiences across multiple bullying roles (victim, witness, perpetrator) and a variety of bullying behaviours. An earlier version (the Brief BEES) covered five domains of bullying behaviour: disinformation, physical/intimidation, verbal, exclusion, and power abuse (by a person of authority) for each bullying role (victim, witness, perpetrator). This version showed good reliability for the overall scale (α = 0.85), and for the subscales victim (α = 0.82), witness (α = 0.85) and perpetrator (α = 0.79). Further, it correlated with expected adult outcomes of childhood bullying: depression, anxiety, stress, rejection sensitivity, aggression, and shame [56, 57]. Past victimhood was particularly associated with current depression, rejection sensitivity, and aggression, while perpetration was most strongly associated with aggression, and an overall scale comprising all roles was the best predictor of adult psychological problems (the three roles were also intercorrelated).

The current version of the BEES was developed to enable a more differentiated assessment of past bullying. To this end, physical bullying and intimidation were divided into separate domains, the domain of property damage was added, and power abuse was discarded, leaving six domains: disinformation, physical, verbal (renamed denigration), exclusion, intimidation, property damage.

The BEES will be examined in the current study with respect to: 1. Whether its underlying factor structure and item characteristic curves (Rasch model) are consistent with the above proposed distinction between different bullying roles (construct validity); 2. Whether its scales correlate with a global self-report of bullying experience (concurrent validity) and in expected ways with related constructs (convergent validity) and antithetical constructs (discriminant validity) representing present mental health and school-age experience.

It was hypothesised that: (1) The underlying factors of the BEES as determined through principal components analysis will be indicative of separate bullying roles (victim, witness, perpetrator), and each will show accordance with a polytomous Rasch model (construct validity); (2) The BEES subscales will correlate respectively with three global items asking participants if they consider themselves to have been involved (as a victim, witness, perpetrator) in bullying during their childhood (based on an extensive definition); and (3) The BEES scales will show expected associations with school-age and present-day experiences, as follows:

  1. a.

    The Victim subscale will be associated with school age emotional problems, and peer problems as well and present-day emotional symptoms (depression, anxiety, stress), but negatively with peer support.

  2. b.

    The Witness subscale will be associated with higher levels of past emotional problems and present-day emotional symptoms (depression, anxiety, stress), but negatively with prosocial behaviour.

  3. c.

    The Perpetrator subscale will be associated with school age conduct problems and with present-day emotional symptoms (depression, anxiety, and stress).

  4. d.

    The overall (multiple role) scale will be associated with higher levels of school-age problems and current symptoms (depression, anxiety, stress).

2 Method

2.1 Participants

To determine the desired sample size, first, recommendations were considered for exploratory factor analysis (EFA) and Rasch model analysis. For EFA, based on a ratio of 10–15 subjects per variable a maximum sample size of 270 would be required [58]. For Rasch analysis, a sample size of 144 is considered appropriate to achieve item difficulties and person measures that are stable (within 0.5 logits) at a confidence interval of 95% [59]. The achieved sample size of 346 covered the above requirements well.

Following Ethics approval from the Cairnmillar Institute Ethics Committee, participants were recruited online, most (304 or 88%) through the ‘SampleSize’ subreddit on Reddit, and 42 (12.1%) through Facebook. Data were collected between November 2020 and December 2021. A total of 521 participants started the survey and 346 completed it, with 116 males (33.5%), 203 females (58.7%), and 27 (7.8%) in other categories. The mean age was 26.3 (SD = 8.18), and most were fulltime students or unemployed (48%), with 102 (30%) in fulltime work, 37 (11%) in parttime work, 35 (10%) in either casual or contractual work, and 7 (2%) either endorsing retired or volunteering. Most (197) endorsed being single (57%), with 91 (26%) in a relationship but not married, 55 (16%) married, and only three (1%) either separated or divorced.

2.2 Procedure

Participants filled out all questionnaires anonymously online in the order corresponding to the below list. Consent was given by starting the survey after reading the plain language information statement. Participants were required to be fluent in English and at least 18 years of age. No incentives were given.

2.3 Materials

Global measure of self-reported bullying experiences: Three global binary items of self-assessed bullying experience asked whether each bullying role (victim, witness, perpetrator) was experienced or not, similar to previous research testing concurrent validity in a child-oriented scale [1]. The given definition of bullying can be found in Appendix A.

Bullying and Exclusion Experiences Scale (BEES): This BEES is an 18-item measure which asks participants to retrospectively report on bullying and exclusion experiences in school years across a 5-point Likert Scale. For each of the three roles (victim, witness, perpetrator), it covers six different behaviour clusters: denigration, intimidation, exclusion, disinformation, physical acts, and property harm. A list of all BEES items can be found in Table 1. Respondents are asked to indicate how often they have experienced each kind of bullying behaviour in each role during school years (0 = Never, 1 = Rarely, 2 = Occasionally, 3 = Often, 4 = Very often). For witness roles, we examined passive experiencers ‘I witnessed it without helping’. Sum scores can be calculated for each bullying role (ranging from 0 to 24) and for an overall (multiple role) sum score (ranging from 0 to 72), with high scores indicating more pervasive bullying role experience.

Table 1 Items of the BEES

Peer social support scale (PSS): Six items adapted from the classmate support section of the 40-item Child and Adolescent Social Support Scale (CASS) [60] were used. Items were reformulated as a retrospective view on childhood experiences (e.g. “My classmates would ask me to join activities”, rated on a six-point scale from “Never” to “Always”). This adapted scale ranges from 6 to 36, which higher scores denoting more peer support. The CASS classmate scale shows excellent internal consistency (Chronbach’s α = 0.93–0.94) and good test-re-test reliability (r = 0.60–0.70), as well as good convergent validity with a comparable social support instrument (r = 0.59), as well as discriminant validity with internalizing (r = − 0.34), externalizing (r = − 0.25), and behaviour symptoms (r = − 0.39).

Strengths and Difficulties Questionnaire (SDQ): Selected subscales from the baseline youth version of the Strengths and Difficulties Questionnaire Youth adapted (SDQ Youth S11-17) [61, 62] were used in a modified form, reworded to focus retrospectively on school age experience from an adult perspective. Example items for each scale include, for Emotional symptoms: “I worried a lot”; Conduct problems: “I fought a lot. I could make other people do what I wanted”; Peer problems: “Other young people picked on me or bullied me”, and Prosocial behaviour: “I was helpful if someone was hurt, upset or feeling ill”. Each of the subscales are formed from five items on a 3-point scale (1 = “Not true”, 2 = “Somewhat true”, and 3 = “Certainly true”). Its scores range from 5 to 15 for each subscale, with higher scores denoting either problems (Emotional Symptoms, Conduct Problems, Peer Problems) or positive behaviours (Prosocial Behaviour). It has been found to show good psychometric properties including internal consistency, validity, and construct validity [63]. The SDQ has shown satisfactory internal consistency as well as test–retest reliability on teacher- and parent-rated versions [64] and good evidence for convergent and divergent validity as well as good criterion validity is discriminating between community and clinical populations [65].

Depression and Anxiety Scale 21 (DASS-21): The Depression Anxiety Stress Scale (DASS-21; [66]) is a widely used 21-item measure of negative emotional state over the last week that relates to symptoms of either depression, anxiety, or stress using a 4-point Likert scale. Sum scores are calculated for depression, anxiety, and stress scales separately, each ranging from 0 to 21, with higher scores representing higher symptomatology. Its reliability and validity are well established across a wide variety of cultures and ethnic groups [67]. The DASS-21 displays good reliability (α > 0.74) across all scales, good criterion validity in differentiating well between cases and non cases [68].

2.4 Data analysis

Based on the Shapiro–Wilk test, non-normality was found in all scales. Examining the plots of these scales showed adequate approximations of normality on most scales except Perpetrator, Anxiety, and Conduct Problems (which were all negatively skewed, especially perpetration) so tests of association were done using Spearman rank correlations and tests of group difference were done using Mann–Whitney U-tests. The analyses used were generally robust towards violations of normality (non-parametric tests, exploratory factor analysis, Rasch analysis), although awareness of the distributions of those three variables should be maintained when interpreting the results.

Construct validity was initially explored by means of a Principal Components Analysis (PCA) with Oblimin rotation, and Bartlett’s test of sphericity was significant (χ2 (210) = 3753, p < 0.001), indicating that the observed correlation matrix was suitable. Further, the Kaiser–Meyer–Olkin (KMO) measure of sampling adequacy indicated good sampling adequacy at 0.84.

Following this, model and item fit (construct validity) were tested using a polytomous Rasch model with joint maximum-likelihood estimation (JMLE). This was chosen in favour of confirmatory factor analysis (CFA) due to the assumptions of classical test theory (inherent in CFA, but not a Rasch model). These assumptions were considered less applicable to this kind of instrument: in particular, the assumption that each item contributes to the measurement of an underlying construct in an equivalent and summable way.

The model analysis was performed on the three main subscales (victim, witness, perpetrator), which satisfied the criterion of unidimensionality (PCAR Eigenvalues all under 2.0 [69, 70]) so perpetration subscales identified in the PCA (aggressive and relational) were not analysed separately. The overall scale did not satisfy the criterion of unidimensionality, so it was also not analysed in a separate model. For analyses containing perpetration items, the two upper categories (‘often’ and ‘very often’) needed to be collapsed for all items due to an empty top category (no one chose “very often”) for one item (disinformation perpetration).

To test convergent validity, three global items were put to respondents on whether they have experienced a bullying role or not, based on a given extensive definition, and these were tested for association with the BEES (Mann–Whitney U-tests, Cohen’s d). To test convergent and discriminant validity, Spearman rank correlations were calculated between the respective BEES subscales and the comparison measures. While all participants (n = 346) completed three measures (the BEES, the global bullying self-assessment (described below) and the peer support scale), to reduce response burden, half the participants (n = 179) completed the DASS-21, and the other half completed the SDQ on school experiences (n = 167), following random assignment. Smaller sub-sample sizes (n = 179 and n-167) were sufficient for correlations and tests of group differences (t-tests), as confirmed by power analysis (two-tailed t-test with d = 0.5, power = 0.8, α = 0.05 required n = 128; regression with one predictor with f2 = 0.15, power = 0.8, α = 0.05 in required n = 54).

JMetrik 4.4.1 was used for Rasch analyses, and Jamovi 2.2.5 was used for all other statistical analysis.

3 Results

3.1 Construct validity

The initial PCA found four factors: a witnessing factor, a victim factor, two perpetrator factors (Aggressive Perpetration, with physical, intimidation and property items, and a Relational Perpetration containing disinformation, denigration, and exclusion items). Cross-loadings were present, especially between the perpetrator items and between the relational perpetrator scale and the corresponding witnessing items (Table 2).

Table 2 Principal components analysis (PCA) for the BEES and Rasch model item fit

Scale quality indices (item- and person- reliability, separation index) were favourable for each subscale (Table 3), except for the person indices on the Perpetuation scale, which can be explained by the lower range of responses on this scale, probably reduced further by collapsing to a 4-point scale on all perpetration items. All item-fit indices fell between 0.65 and 1.40 (Table 3), so well within the bounds of 0.5–1.5 described by Linacre [71] as indicating go of fit. For all further analyses, the two perpetration subscales were included in addition to the overall Perpetration scale.

Table 3 Rasch model scale quality indices for the BEES items

Internal consistency was analysed for all subscales and the overall scale. Good internal consistency as measured by Cronbach’s α was displayed across all BEES scales (Table 4). The SDQ scale ‘Conduct Problems’ had an alpha of 0.52 but was included as this likely reflects different ways in which these problems can manifest in the sample while still representing a meaningful underlying construct. The bully scales were intercorrelated (victim-witness: r = 0.50, p < 0.001; perpetrator-witness: r = 0.37, p < 0.001), except for perpetrator-victim. Interestingly, relational perpetration correlated with victimhood (r = 0.11, p < 0.05), but aggressive perpetration did not.

Table 4 Descriptive and reliability statistics for all scales

3.2 Concurrent validity

To test concurrent validity, Mann–Whitney U-tests were used to compare BEES scores between self-identifiers and non-identifiers for each of the respective bullying roles. Rank-biserial correlations were referred to as a measure of effect size. For each bullying role, self-identification of bully role involvement was associated with the corresponding BEES scale: (For Victimhood: U = 1447, p < 0.00001, r = 0.84; for Witnessing: U = 6352, p < 0.00001, r = 0.48; for overall Perpetration: U = 5763, p < 0.00001, r = 0.58; for Aggressive Perpetration: U = 9177, p < 0.00001, r = 0.33 and Relational Perpetration: U = 6235, p < 0.00001, r = 0.54).

3.3 Convergent and discriminant validity

Regarding tests of convergent and discriminant validity, Spearman’s rank correlations are displayed in Table 5. Among the BEES measures, the victimhood subscale was negatively associated with peer support and positively associated with school age emotional problems, and peer problems. Again, in the BEES measures, perpetration was negatively associated with prosocial behaviour. Perpetration was also positively associated with school age conduct problems. Current emotional symptoms also correlated largely as hypothesized with the BEES scales, except that perpetration was not related to depression.

Table 5 Correlations between the BEES and the comparison scales

4 Discussion

4.1 Summary of results and implications

The present study aimed to assess the validity of the BEES measure to assess its applicability for retrospective reporting of school-aged bullying experiences in adults.

4.1.1 Construct validity

The Rasch model analyses showed generally good scale quality and item fit, indicating that the pattern of responses for each item reflected good alignment with a respective underlying construct, namely each bullying role. There was a good spread of items across the ‘difficulty to endorse’ indices of all scales, which could be understood as ‘severity’ of bullying experience. For example, the Victimhood scale contained items describing experiences which reflected different realms along the severity continuum of the same fundamental construct, namely, being targeted. An exception to the good scale quality indices were the person indices on the perpetration scale, which can be explained by the narrower range of responses of this subscale. This could be due to social desirability or [72, 73] self-selection effects, where perpetrators may not volunteer or may underreport harmful behaviour when compared to peer or teacher reports [74, 75].

While the perpetration subscales (Aggressive, Relational) showed lower scale quality, probably influenced by lower item numbers, researchers may also choose to include these subtypes of perpetration separately, as their differential correlations may help us to understand more about the differences between these perpetrator groups. For example, Aggressive Perpetration was related to Conduct Problems and present-day depression, but not Victimhood; Relational Perpetration was related to Victimhood, but not Conduct Problems or present-day depression. Whilst the present study cannot explain these differences, other studies have commented on different types of perpetrators, their different motivations to bully, and better predicting subsequent outcomes (e.g. [73, 75]). The correlations amongst the bullying roles provide evidence for concurrent and/or transitioning occupation of different bullying roles, as has been identified in the literature [9, 54, 55, 76,77,78].

4.1.2 Concurrent, convergent and divergent validity

Confirming concurrent validity, the BEES scales were associated with the global self-reports of bullying experiences in each bullying role, with substantial effect sizes, supporting the idea that high scores on the BEES were strongly indicative of prior bullying experience.

Regarding convergent validity, the associations between the bullying roles and present-day depression, anxiety and stress were mostly replicated, with the exception that the Perpetration scale was not associated with depression. It is possible that there was a weaker association that was not detectable, as previous associations between perpetration and depression have been detected using similar items [56, 57]. The Victimhood scale was the bullying role most strongly associated with present-day depression, anxiety, and stress, consistent with research that has shown adult victim-survivors of bullying have a higher propensity to suffer from such psychological disturbance and distress [19, 34, 39, 42,43,44, 47, 49, 50]. The overall BEES scale was the strongest predictor of all forms of present distress, possibly suggesting that exposure to past bullying from the perspective of more than one role, which might occur where a school has a significant ‘bullying culture’, is particularly noxious, even into adulthood.

Hypotheses regarding school-age experiences were all confirmed, with victimhood associated with low levels of peer support, and as expected, with emotional problems and peer problems (as was witnessing). Perpetration was confirmed as being associated with low levels of prosocial behaviour and with conduct problems. As an aside, victimhood was also quite strongly associated with conduct problems, which would seem to be consistent with the intercorrelations amongst all BEES roles in both studies, as well as the high present day aggression levels found in victims previously [56, 57].

Support for almost all hypotheses provides good evidence for the overall validity of the BEES. Many of the hypotheses referred to victims and perpetrators due to these roles being more frequently identified as correlates of psychological disturbance. However, in this study, findings pertaining both to the witness role, and the overall (multiple role) scale suggest that these are important aspects of bullying experience related to both school-age and adult psychological disturbance. Other studies have demonstrated such effects for bystanders [38, 79] but less often have studies examined the experience of multiple roles.

5 Limitations

It should be noted that this study draws on online responses from participants with an average age of 26.3, with recruitment from specific online platforms, and as such, the sample may not be representative of other populations. Also, the retrospective reporting and cross-sectional design limit causal interpretation. However, we would point out that there is substantial theoretical [2] and empirical [19, 39] support for the idea that past bullying can influence present psychological functioning and research suggests stressful and traumatic memories are often memorable, including adult memory of childhood stressful events [80]. Further, the fact that the global measure of bullying experience and the BEES were administered in close proximity may have influenced findings on concurrent validity. Finally, it should be noted that a separate item on cyber-bullying was not included in the BEES measures. The BEES items may still capture bullying behaviours in the cyber context and future research could explore this. There is ongoing debate as to whether considering cyber-bullying as a distinct phenomenon is helpful [81] or unhelpful [82].

6 Conclusions

The present study suggests that the BEES offers promise as an efficiently administrable measure of school-age bullying experiences for adults. A major strength is its ability to provide scores which differentiate between meaningful bullying roles (victim, witness, perpetrator) as well as between different kinds of bullying behaviours. This scale appears to be a valid measure for use with adults to understand bullying experiences during their school life. This provides a methodological option for researchers and clinicians where so far little has been available.