Introduction

Fieldwork is a critical tool for scientific research, particularly in applied natural and social science disciplines, including those integral to addressing urgent problems such as climate change1,2. Scientific fieldwork, broadly understood as research or data collection conducted outside of a conventional office or laboratory environment3, is also central to ecosystem management efforts by government agencies and nonprofit organizations, including monitoring of threatened or invasive species, assessments of habitat quality, and hands-on management efforts such as wildfire fighting. However, fieldwork is often unsafe, particularly for members of historically marginalized groups and those whose presence in scientific spaces threatens traditional hierarchies of power and authority4,5,6. Much attention has been paid in particular to the overt and subtle ways that women, gender minorities, racially marginalized groups, and other underrepresented individuals have been made to feel unwelcome and unsafe in scientific spaces generally7 and in field settings specifically8. Often these threats are interconnected, with racially and sexually marginalized groups such as women of color reporting higher rates of abuse at work and in science9,10,11.

Sexual harassment—sometimes referred to as sex-based or gender-based harassment because it is rarely sexually motivated12,13,14—and sexual assault are widely documented, as are the negative consequences on victims’ well-being and careers15,16. Research has also identified negative repercussions for the workplace as a whole, such as studies linking masculinity contest cultures (i.e., work environments that reward strength, overconfidence, and competitiveness) to lower psychological safety17 and organizational performance18. Sexual harassment can also negatively impact physical and mental health19,20, and the lasting mental and physical consequences of sexual assault are well-documented21,22,23.

Expanding field safety requires minimizing the risk of harassment, assault, and bullying. Addressing these concerns is important for employers and academic organizations that are legally and morally bound to protect their students and staff in the field. Effective action requires research to identify which types of interventions work to prevent field-based sexual harassment and assault from occurring in the first place. Prevention can be understood as a wider and more productive lens than response, as prevention seeks to protect field participants from becoming victimized in the first place. Prevention includes precautionary measures such as the creation of substantive policies8, systematic attention to positive organizational culture17, and meaningful training of participants and staff in how to model best practices for field safety24. These trainings may include recommendations for how to respond effectively to incidents of harassment and assault including establishing confidential reporting channels, sanctioning of perpetrators, and protection of victims and reporters from retaliation. Beyond simply preventing risk, these trainings and recommendations may be doubly beneficial because they can send an early signal to establish a field culture that discourages future hostile behaviour10. While prevention trainings are becoming more commonplace in traditional academic and policy settings, few prevention initiatives exist to specifically target the unique high-risk setting of fieldwork25.

An important target for harassment prevention programs is the perceived ability to execute desired behaviours, termed self-efficacy24,26. Training programs that incorporate a bystander intervention approach (i.e. active helping by third parties who observe an incident) have been associated with greater self-efficacy27,28 which in turn has been associated with actual behaviours27,29,30. Beyond individual self-efficacy, a second important goal for harassment prevention programs is collective efficacy, or the perceived ability of a community to execute desired behaviours31,32. Finally, increasing knowledge of sexual harassment and assault definitions and resources is a baseline goal for most prevention training programs33,34.

This study documents a collaboration between scientists at academic institutions and decision-makers at a government agency to administer and evaluate a sexual harassment and assault prevention training program (known as Building a Better Fieldwork Future training, or BBFF training) led by trained agency staff. We used the agency rolling out this training to several hundred staff as a quasi-experiment: participants were not randomly assigned, but we compared before/after changes within subjects and used staff who had not yet been trained as a control. The training itself included knowledge-based interventions, social modeling, and mastery experiences, all of which have been linked to greater self-efficacy elsewhere35,36,37 but which have yet to be studied in sexual harassment and assault prevention. Our research questions [RQs] are:

  1. RQ1

    Does participation in the training increase participants’ capacity to take action to create inclusive, safe field environments?

    Hypothesis H1a: Post-training prevention self-efficacy and knowledge will increase significantly in the intervention group compared to the control.

    Hypothesis H1b: Changes in primary outcomes will not be sustained over time.

  2. RQ2

    Does participation in the training increase participants’ actions to create inclusive, safe field environments?

    Hypothesis H2a: Post-training prevention behavioural intention will increase significantly in the intervention group compared to the control.

    Hypothesis H2b: Changes in primary outcomes will not be sustained over time.

  3. RQ3

    Does the training work equally well for all demographic groups?

    Hypothesis H3a: Increases in post-training knowledge, self-efficacy, and behavioural intention will be higher for women and gender minorities compared to men.

    Hypothesis H3b: No significant differences will be observed in post-training outcomes based on gender, age, race/ethnicity, role, region, or time at the agency.

    Hypothesis H3c: Increases in post-training behavioural intention and self-reported behaviour will be higher for staff who reported higher levels of pre-training prevention behaviour and prevention personal norms compared to their less engaged and committed peers.

  4. RQ4

    Do reporting rates increase after participants receive information about sexual harassment and assault?

    Hypothesis H4a: Post-training confidence in reporting and likelihood to report an incident of sexual harassment and assault will be higher in post-training surveys than pre-training.

In addition to these core research questions, we also determined the baseline level of participation in and support for sexual harassment and assault prevention actions at the agency.

Methods

Pilot data

A pilot study assessing the efficacy of the BBFF harassment prevention program involved the distribution of surveys to participants prior to and after completing the training. These surveys asked participants to rate their degree of agreement on a Likert scale with the following four statements, which were informed by previous evaluative research29,38,39 on self-efficacy and knowledge outcomes of sexual harassment and assault bystander intervention training:

  1. 1.

    I feel knowledgeable about existing resources to help me prevent, intervene in, and report sexual harassment and assault in field settings;

  2. 2.

    I feel confident in my ability to prevent sexual harassment and assault in field settings;

  3. 3.

    I feel confident in my ability to intervene in an incident of sexual harassment and assault in a field setting; and

  4. 4.

    I feel confident in my ability to report an incident of sexual harassment and assault in a field setting.

Surveys were collected immediately prior to and after participating in the BBFF training from 2019 to 2021. Paired pre-post results demonstrated that participants (n = 181) reported significantly (1) greater knowledge about resources to prevent sexual harassment and assault in field settings; and greater confidence in (2) preparing for, (3) intervening in, and (4) reporting sexual harassment and assault in scientific fieldwork settings after completing the BBFF training (Wilcoxon signed-rank tests, p < 0.001, Cronin et al., unpublished data). This pilot study was conducted using mainly academic scientists and practitioners conducting fieldwork, including researchers, university faculty, and graduate and undergraduate students. This narrow study examined a small pool of respondents and only asked four questions related to prevention knowledge and self-efficacy. Still, results suggested that the BBFF training has benefits for self-reported measures and provides the foundation for the deeper analysis described here.

Study design

We ran a pre-post, non-randomized intervention study of a training delivered to staff at the California Department of Fish and Wildlife (CDFW). CDFW is a large state agency responsible for managing and protecting the state's wild plants, animals and ecosystems. As part of this mission, CDFW requires employees across its seven regions in the state to engage in substantial fieldwork. We used both between-subjects and within-subjects comparisons to determine impact of the training on participants. Training participants were CDFW employees who participate in or manage field science and research. This includes scientific aides, who collect fishery data and biological samples at a range of field sites, the direct supervisors of scientific aides, and the regional managers of each of the seven regions. All staff had previously received mandatory sexual harassment prevention training (which is not specific to fieldwork settings) in relevant CDFW policies and reporting protocols.

A small group (17) of CDFW staff who were nominated by their supervisors from each region were trained by the BBFF Program to become Certified Instructors via a “train the trainers” session consisting of a two-day intensive virtual workshop where instructors received information on content, facilitation, and common questions, followed by a practice session during which instructors delivered a mock training to BBFF staff. These instructors then delivered the same 90-min training to CDFW staff in their region. Trainings for CDFW staff were supervised by experienced BBFF staff.

Each training followed a predetermined script with the same content and lasted 90 min. The training had five major components: (1) introduction to harassment and assault in fieldwork, (2) preparation, (3) intervention, (4) reporting, and (5) scenario-based discussions. In, the facilitator explained why fieldwork settings are particularly high-risk for sexual harassment and assault and outlined the legal and institutional definitions of harassment and assault. In (preparation), the facilitator shared best practices for field-ready protocols, including Codes of Conduct, community agreements, Field Safety plans, and privacy, medical, and other protocols. In (intervention), participants learned basic bystander intervention tools. In (reporting), the facilitator explained the importance of reporting, reporting requirements for staff, and how the reporting process works at CDFW. The scenario-based discussions were interspersed throughout the training, each related to the information and skills learned in the section prior, and range in severity from instances of gender bias to assault in field settings (Box 1). Participants were broken into small groups of three to five members to discuss each scenario for four to five minutes. After discussing in small groups, a larger-group discussion was facilitated by the instructor for five minutes. Trainings were delivered via virtual meeting platform.

The training design incorporated three micro-intervention strategies that have been shown elsewhere to improve self-efficacy: knowledge-based interventions, social modeling, and mastery experiences31,36,37. Knowledge-based interventions are designed to increase knowledge about harassment and assault so that participants are more likely to identify it when they observe it38,39,40. Social modeling suggests that people learn to imitate others by replicating their intended and observed behaviours. Finally, mastery experiences are the personal experience of success, whereby participants take on a new challenge and feel that they have succeeded at it, thus building self-efficacy in that area31. Components 1–4 of the training constituted the knowledge-based intervention, and social modeling and mastery experiences occurred in the scenario-based discussions.

Survey questions included in this study draw on previously validated survey questions from the Bystander Efficacy Scale, which can be used to assess an individual’s confidence in performing bystander behaviours27,28. Survey questions were not replicates of those used in the scale, but gauge confidence about and self-reported likelihood to intervene related to eight main themes as outlined by the scale and similar research on bystander intervention evaluation26,27,29,30,38,39,41: knowledge, self-efficacy, personal norms, collective efficacy, self-reported behaviour, behavioural intention, observed behaviour, and training attitude. To adapt our questions to the setting of a government agency, response options were simplified to a seven-point Likert scale rather than the 0–100 used by the original scale (Table S1).

Sampling design

Staff from each of the seven CDFW regions and the CDFW Office of Spill Prevention and Response were trained from March to June 2022. Each training included 30–40 participants and was led by a certified instructor. Given the restricted nature of field-based employees’ schedules and limited access to reliable WIFI and cell service, random assignment to training sessions was not possible; participants were assigned opportunistically to training sessions based on their availability.

The program was piloted with a single training Southern Coastal California (Region 5), which includes 38 field staff. Trainings were then delivered region by region, in an order determined through discussions with the six other regional managers within CDFW. Trainings were delivered in two batches to create a control group for the quasi-experimental design: the first half of CDFW staff were trained from April 13th to July 5, 2022, and the second half was trained from July 15th until September 25, 2022. At the halfway point post-training data were collected from all staff, allowing for comparison between the intervention group (those who have been trained) and the control group (those who have not yet been trained). Due to constraints associated with staff availability, survey data collection began prior to acceptance of the initial study protocol in the Stage 1 manuscript; however, the data was not accessed before that point, and no data collection protocols were altered after starting data collection.

Surveys were distributed in three waves: Time 1, Time 2, and Time 3, such that treatments are nested within individuals (Table S2). Roughly one week prior to taking the training, participants were invited to complete the Time 1 (pre-training) survey establishing baseline behavioural beliefs, self-reported past behaviour, behavioural intention, and demographics. The invitation email contained a link to the survey platform Qualtrics hosted by UC Santa Cruz. Survey recipients who did not complete the survey received a reminder email roughly two days before their training.

After completing the pre-training baseline survey, participants were contacted by a BBFF Program coordinator to sign up for a 90-min training offered within their region. Immediately after each training, participants were then sent a link to the Time 2 (immediate post-training) survey of behavioural beliefs and intention. Participants received at least one reminder to complete the post-training survey.

At the halfway point (July 6–July 14, 2022), the Time 3 (midpoint) survey midpoint survey data was sent to all staff to measure behavioural beliefs, self-reported past behaviour, and behavioural intention and allow comparison between those who had been trained (treatment group) and those who had yet to be trained (control group). The link to the follow-up survey on Qualtrics was sent via email. Survey recipients received one reminder email a week later. For a full list of predictor, outcome, and control variables measured in each survey wave, see Table S1.

The survey data were supplemented by incident reporting data provided by the CDFW Office of Equal Employment Opportunity's (EEO). After the training program was completed, EEO staff provided aggregated deidentified counts of reported incidents related to sexual harassment from January 2020 to October 2022 (two months after the end of the study period). Reports were coded as either field-based or not, grouped into general categories of incident type (e.g., derogatory comments, hostile work environment, etc.). For reports received after the training program began, EEO staff indicated whether the person who reported the incident was previously trained by the BBFF program. Because incident reports related to sexual harassment are infrequent at CDFW (e.g., fewer than 10 reports per year), these results were not used to test a hypothesis, but were included as anecdotal context in our results.

Analysis plan

To determine the effect of the trainings on our primary outcomes (RQ1 and RQ2), we ran adjusted and unadjusted linear regressions with post-training knowledge, self-efficacy, prevention behavioural intention and self-reported prevention behaviour (measured 1–2 months after the BBFF training) (Table 1). The preregistered protocol specified that adjusted regressions including demographic variables would match pre-training beliefs and self-reported behaviours; however, due to constraints associated with sample size, only unadjusted regressions could be conducted for treatment–control data. Specifically, we used data from treatment and control groups to determine if treatment condition was a significant predictor of the outcome variable at Time 3 when controlling for that variable at Time 1 (unadjusted regression) or when controlling for that variable and for demographics and the other outcome variables at Time 3 (adjusted regression). We conducted ordinal logistic regressions as a sensitivity analysis, given our outcomes are measured on 7-point scales. The training experience was coded as a binary variable, with the control (not yet trained) as 0 and the treatment (training received) as 1.

Table 1 Design table.

We prevented overfitting in our models by checking we have 20 or more observations per variable. If this threshold was not met, we pre-screened potential covariates using a bivariate likelihood ratio test with the outcome, and included covariates with p value less than 0.20. If we found that the training significantly influenced both post-training perceptions (knowledge and self-efficacy) and behavioural measures (behavioural intention and self-reported behaviour) we ran mediation analyses to determine if changes in perception mediated changes in prevention behaviour. Our original registered report design planned for roughly 250 participants in each experimental condition (intervention and control) and assuming a standard deviation of 1, we are powered to detect a difference in means with effect size of 0.32 for continuous outcomes with an alpha of 0.05 and power 0.95. This is within the range of effect sizes detected in other studies of sexual assault prevention knowledge, self-efficacy and collective efficacy in non-scientific settings33,42,43. However, due to difficulties obtaining survey responses from participants (see “Study limitations and future research” section in the “Discussion”) our final sample size for treatment–control analyses was 140 participants who completed all three survey waves, 80% (n = 112) of whom were in the treatment group and 20% (n = 28) of whom were in the control group, giving us power to detect an effect size of 0.76 (large effect size) for continuous outcomes with an alpha of 0.05 and power 0.95. Thus, we are underpowered to detect smaller effect sizes for the treatment–control analyses, but remain powered to detect large differences in score changes between the study groups.

We measured longitudinal change within subjects using mixed effects models (also known as multilevel models) to compare responses at three time points for data nested within an individual (RQ1, RQ2, Times 1, 2, and 3)44. We used these models to compare the effectiveness of the trainings for different groups with CDFW (RQ3). The preregistered protocol specified that we would run additional ANOVA comparisons; however it was determined that multilevel models were more appropriate to test for differences between time points for the data. Using within-subjects comparisons, we looked for differences in training outcomes by gender, race/ethnicity, age, seniority (years employed at CDFW and position within the agency), region, and duration of time employed at CDFW. For race/ethnicity analyses, demographic data were scored as binomial (Yes = 1, No = 0) for under-represented minority (URM) status for respondents who identified primarily as African American/Black, American Indian/Alaskan Native, and/or Hispanic/Latino. Respondents who chose not to specify demographic data were not included in these analyses. We ran moderation analyses to determine if pre-training prevention behaviour and beliefs moderated the impact of the training on post-training behavioural intention and self-reported behaviour.

The preregistered protocol specified that we would run tests for collective efficacy and prevention behaviour (self-reported) using treatment–control data (Table 1); however, our Time 1 and Time 3 survey questions did not adequately measure these variables (Table S1); thus these tests were omitted.

Exploratory analyses

Given sample size limitations for treatment–control data that prevented the use of adjusted regressions, we used the larger within-subjects dataset to conduct adjusted regressions for the effects of race and gender of change between pre- and post-training scores related to knowledge, self-efficacy, and behavioural intention.

Ethical approval

The research complied with all relevant ethical regulations, and a University of California (UC) Santa Cruz Internal Review Board (IRB) permit was received for research involving human subjects (UCSC #HS-FY2022-226). Informed consent was sought from all participants prior to data collection.

Results

Sample groups

A total of 44 trainings were delivered to 925 CDFW employees between April and August 2022. After filtering out for empty and duplicate surveys, a total of 1048 surveys representing 630 participants were completed across the three waves (Table S3). We used three datasets to conduct analyses: (1) for treatment–control comparisons, we matched 140 participants who completed both the pre-training survey (Time 1) prior to the midpoint, and also completed the midpoint survey (Time 3), 80% (n = 112) of whom were in the treatment group and 20% of whom were in the control (n = 28). (2) For longitudinal analyses using multilevel models, we matched 64 individuals who completed all three survey waves and were in the treatment group for the Time 3 survey (so that the immediate post-training survey and 1–2 months post-training surveys were both completed after receiving the training). (3) For within-subjects pre- and post-training comparisons, we matched 196 participants who completed both pre-training (Time 1) and post-training (Time 2) surveys (Tables S2, S3); of these, surveys that included responses for gender (n = 173) and race (n = 165) were used for exploratory analyses.

For questions that were asked separately for harassment and assault, we grouped responses for highly correlated responses with Cronbach’s alpha > 0.8 into a single mean score (Fig. S1).

Does participation in the training increase participants’ capacity to take action to create inclusive, safe field environments?

H1a. Post-training prevention self-efficacy and knowledge will increase significantly in the intervention group compared to the control.

Linear regression results indicate significant increases in the treatment group compared to the control group in self-reported knowledge and prevention and intervention self-efficacy (p < 0.05, Table 2, Fig. 1a) The largest effect sizes were for knowledge (β = 0.84, 95% CI [0.30, 1.38], p = 0.003) and prevention self-efficacy (β = 0.74, 95% CI [0.24, 1.23], p = 0.004). We did not find significant differences between the treatment and control groups in the change in the other two forms of self-efficacy (reporting and encouraging others, Table 2, Fig. 1a), indicating partial support for Hypothesis 1a.

H1b. Changes in knowledge and self-efficacy will not be sustained over time.

Table 2 Linear model results for treatment (n = 112) and control group (n = 28) responses for survey questions for sexual harassment and assault related to knowledge and self-efficacy, and behavioural intention at two time points: pre-training (Time 1) and 1–2 months post-training (Time 3) (n = 140).
Figure 1
figure 1

Linear model results for treatment and control group responses for survey questions for sexual harassment and assault related to (a) knowledge and self-efficacy, (b) behavioural intention, and (c) self-reported behaviour at two time points: pre-training (Time 1) and 1–2 months post-training for treatment and control groups (Time 3) (n = 140).

Within-subjects comparisons revealed significant increases in self-reported knowledge (β = 1.1 95% CI [0.75–1.45], p < 0.001) and all forms of self-efficacy immediately after the training (β = 0.53–0.69, 95% CI [range 0.26–0.99], p < 0.001, Table 3, Fig. 2a). This effect was sustained in our within-subjects comparison 1–2 months after the training delivery for self-reported knowledge (β = 0.61, 95% CI [0.24, 0.97], p < 0.001) and prevention, intervention, and encouragement self-efficacy (β = 0.18–0.45, 95% CI [range 0.05–0.76], p < 0.01). We did detect a drop-off in the intensity of the effect at the 1–2-month mark, but the increase was still significant compared to baseline (p < 0.01, Table 3). In contrast, for reporting self-efficacy, scores returned to baseline 1–2 months after training delivery. This indicates a rejection of Hypothesis 1b for all variables except reporting self-efficacy.

Table 3 Longitudinal multilevel model results (RQ1, Hypothesis H1a and RQ2 Hypothesis H2a) for variables related to knowledge, self-efficacy (RQ1), and behavioural intention (RQ2) concepts between scores pre-training (Time 1), immediately post-training (Time 2), and 1–2 months post-training (Time 3).
Figure 2
figure 2

Longitudinal multilevel models showing survey responses at three time points within-subjects: pre-training (Time 1), immediately post-training (Time 2), and 1–2 months post-training (Time 3) for survey questions related to (a) knowledge and self-efficacy and (b) behavioural intention (n = 64).

Does participation in the training increase participants’ actions to create inclusive, safe field environments?

H2a. Post-training prevention behaviour (self-reported) and behavioural intention will increase significantly in the intervention group compared to the control.

There were no differences detected in our linear regressions between treatment and control groups for any form of behavioural intention (Fig. 1b), nor for most forms of self-reported behaviour, except for intent to encourage others to take action, which was significantly greater for the treatment group versus control (β = 0.49, 95% CI [0.04, 0.94], p = 0.034, Table 2, Fig. 1c). However, scores for these questions were extremely low with little variation for both time points and treatment groups, suggesting floor effects (questions measured frequency of intent to act, with means in the range of 1.15–1.57 on a seven-point scale). Thus, we reject Hypothesis 2a.

H2b. Changes in behavioural intention will not be sustained over time.

Although we detected almost no differences between treatment and control groups, within-subjects comparisons revealed significant increases in all forms of behavioural intention immediately after the training (β = 0.85–1.32, 95% CI [range 0.36–1.82], p < 0.001, Table 3, Fig. 2b). Unlike results for knowledge and self-efficacy, significant increases in behavioural intention were sustained 1–2 months after the training delivery only for reporting intention (β = 0.58, 95% CI [0.09, 1.07], p = 0.02), while prevention, intervention, and encouragement intention all returned to not significantly different from baseline levels. These results indicate support for Hypothesis 2b for all forms of behavioural intention except reporting.

Does the training work equally well for all demographic groups?

H3a. Increases in post-training knowledge, self-efficacy, and behavioural intention will be higher for women and gender minorities compared to men.

Our within-subjects comparisons indicated that increases in knowledge immediately after the training were higher for women (n = 88) than men (n = 82) (β = 0.8, p = 0.004, Table 4, Fig. 3). However, there was no detectable gender difference in changes for scores related to self-efficacy and behavioural intention (Fig. 3a,b). Thus, Hypothesis 3a can be accepted for knowledge but not for self-efficacy or behavioural intention.

Table 4 Linear model results comparing pre-training (Time 1) and post-training (Time 2) responses among gender (n = 173) and race (n = 165) groups for survey questions for sexual harassment and assault related to knowledge, self-efficacy and behavioural intention.
Figure 3
figure 3

Change in scores related to knowledge and self-efficacy for men (n = 82) and women (n = 88) pre-training compared to post-training. Colored asterisks denote significant differences between scores within gender groups; black asterisks denote significant differences in changes in scores between gender groups (e.g. interaction effect).

No respondents reported identifying as non-binary, transgender, or other gender minorities; thus effects for other gender minorities could not be investigated.

H3b. When controlling for gender, no significant differences will be observed in post-training outcomes based on age, race/ethnicity, role, region, or time at the agency.

Within-subjects linear regressions comparing pre- and post-training data failed to detect significant differences in change in knowledge, self-efficacy, or behavioural intention based on age, race/ethnicity, education level, tenure at CDFW, occupation, or region when controlling for gender (p > 0.05, Table S4, n = 196, Fig. S2). Thus we find support for hypothesis H3b.

H3c. Increases in post-training behavioural intention will be higher for staff who reported higher levels of pre-training prevention behaviour and prevention personal norms compared to their less engaged and committed peers.

We used a correlated composite variable for self-reported pre-training prevention behaviour (Cronbach’s alpha = 0.91). Due to the prevalence of scores centered around the mean (score = 1.18), the data was categorized into two groups: “high prevention behaviour” (composite score > 1.2; n = 59) and “low prevention behaviour” (composite score < 1.2; n = 121). Using a correlated composite value for personal norms related to both harassment and assault (Cronbach’s alpha = 0.96), respondents were divided around the mean (composite score 6.13) into “high personal norm” (composite score range 6.25–7; n = 98) and “low personal norm” groups (composite score range 4–6.13; n = 78).

No differences in the change in behavioural intention were detected in our moderation analyses (linear regressions) between the high prevention behaviour and low prevention behaviour groups (β = − 0.26 to 0.26, 95% CI [range − 0.85 to 0.86], Table S5). Similarly, no differences were detected between norm groups for change in behavioural intention after the training (β = − 0.38 to 0.36, 95% CI [range − 0.9 to 0.6], Table S5). Thus, we can reject hypothesis H3c.

Do reporting rates increase after participants receive information about sexual harassment and assault?

H4a. Post-training confidence in reporting and likelihood to report an incident of sexual harassment and assault will be higher in post-training surveys than pre-training.

Reporting self-efficacy and behavioural intention were not significantly different between treatment and control groups (Fig. 1). However, our mixed effects models found that within subjects, both reporting self-efficacy (β = 0.53, 95% CI [0.26, 0.79], p < 0.001) and intention (β = 0.91, 95% CI [0.43, 1.39], p < 0.001) increased immediately after training (Table 3, Fig. 2). While this increase was sustained for reporting intention (β = 0.58, 95% CI [0.09, 1.07], p = 0.02), reporting self-efficacy returned to baseline 1–2 months after training (β = 0.18, 95% CI [− 0.10, 0.45] p = 0.208), Table 3, Fig. 2). These results partially support H4a.

On the other hand, sexual harassment and assault incident report data provided by CDFW indicated only one sexual harassment incident complaint was filed from April to August the year prior to the study period (2021). During the same period the following year (2022, the study period), three sexual harassment complaints were filed to CDFW, though only one of these complaints was filed by an employee who had participated in a BBFF training. The small number of incident reports prevented quantitative analyses of the change in reports filed.

Exploratory analyses

Because of small sample sizes for RQ1, RQ2, and RQ4, we ran additional linear regressions using pre-post training data (n = 196) as a validity check for our results. Consistent with previous pilot data, results indicated significant increases in all forms of knowledge (β = 1.17, 95% CI [0.9, 1.42], p < 0.001), self-efficacy (β = 0.53–0.6, 95% CI [range 0.29–0.85], p < 0.001), and behavioural intention (β = 0.72–0.87, 95% CI [range 0.38–1.2], p < 0.001) immediately after the training compared to participants’ pre-training scores (Table S6).

Individual variability in the effect of the training

Within-subjects longitudinal comparisons suggested considerable individual variability in predicted effects of the training on knowledge, self-efficacy, and behavioural intention that were not captured by the fixed effects of the models (random effects ranging from τ00 = 0.44–1.15; Table 3). To further investigate this variability, we conducted more detailed analyses of gender and race particularly for changes in knowledge, self-efficacy, and behavioural intention.

Effects of race and gender

In addition to gender differences in score change (RQ3), we investigated the effect of gender on within-subjects responses for knowledge and self-efficacy. While both men (n = 82) and women (n = 91) demonstrated significant increases post-training in all forms of knowledge and self-efficacy compared to pre-training (β = 0.4–0.82, 95% CI [range 0.02–1.21], p < 0.05, Table 4), both before and after training, women consistently reported significantly lower scores than men for both knowledge (β = -1.02, 95% CI [− 1.4, − 0.64], p < 0.001) and self-efficacy (β = − 0.43 and − 0.91, 95% CI [range − 1.27 to − 0.05], p < 0.05, Table 4, Fig. 3). There was no significant difference between genders for scores related to behavioural intention.

We also investigated the effect of race and ethnicity on within-subjects pre- and post-training responses for knowledge, self-efficacy, and behavioural intention. We compared respondents who identified as white (n = 110) to those with one or more non-white racial or ethnic identities, grouped together as underrepresented minority (URM) respondents (n = 55). While both groups reported significant increases in knowledge and self-efficacy from pre-training to post-training (β = 0.59–1.22, 95% CI [range 0.25–1.56, p < 0.001), URM respondents consistently reported significantly lower scores than white respondents for knowledge (β = − 0.47, 95% CI [− 0.89, − 0.05], p = 0.028) and all forms of self-efficacy (β = − 0.4 to − 0.6, 95% CI [range − 0.90, 0], p < 0.05, Table 4, Fig. 4). On the other hand, URM respondents reported higher behavioural intention than white respondents for intervening (β = 0.67, 95% CI [0.09, 1.24], p = 0.023) and reporting (β = 0.60, 95% CI [0.03, 1.17], p < 0.05, Table 4).

Figure 4
figure 4

Change in within-subjects scores related to knowledge and self-efficacy for underrepresented minority (URM, n = 55) and white respondents (n = 110) pre-training compared to post-training. Colored asterisks denote significant changes between pre-training and post-training scores within each gender group.

Discussion

Training appears to boost sexual harassment and assault prevention knowledge and self-efficacy, but effects on behaviour remain unclear

Sexual and gender-based harassment and assault are pervasive in scientific and natural resource fieldwork, but the recent development of training and intervention programs seeks to reduce its prevalence and empower field scientists and students. However, many training programs seeking to reduce harassment in the workplace fail to produce results, and many have even backfired because they induce defensiveness, greater acceptance of harassing behaviours in perpetrators, and/or retaliation against victims who complain, leading to worker disaffection and turnover33,45. In this study, we examined the impact of an interactive, peer-based, fieldwork-focused harassment and assault prevention training program delivered to staff of CDFW, a US state natural resource agency. Our findings demonstrate both immediate increases and longer-term persistence in three established precursors of action: self-reported knowledge, self-efficacy, and to a lesser extent behavioural intention. These results suggest the potential positive impact of a relatively short-duration training program in contributing to broader organizational efforts to end harassment and assault risk within the high-risk setting of scientific fieldwork.

While these results suggest promising outcomes for post-training increases in knowledge- and self-efficacy, the long-term effect of the training on behavioural intention was weaker, a pattern that is aligned with other assessments of training interventions46. This aligns with other research that has identified that knowledge and self-efficacy are necessary but not sufficient precursors to behavioural intention, so training efforts may need to consider other constructs such as social norms, attitudes and identity to impact behavioural intention41,47. Given that the BBFF training we assessed lasts only 90 min, further research is needed to investigate the effects of longer-duration trainings; in particular, multiple studies have suggested four hours as a minimum duration for long-term training effectiveness48,49,50 (though other studies have suggested that diversity training content is more important than duration in determining outcomes33). Regardless, these short-duration trainings can be viewed as a way to initiate deeper efforts towards harassment and assault prevention action, as they help build knowledge and self-efficacy among participants toward an immediate goal of behaviour change and an ultimate goal of organizational culture change. Our results further suggest that trainings that seek to improve bystander efficacy may be strengthened by integrating mastery experiences, knowledge-based interventions, and social modeling36.

The role of race, gender, and past experience in training outcomes

We did not detect differences in training outcomes based on gender, race, or other individual-level characteristics. However, we did find that women score lower than men on most within-subjects metrics, and URM individuals scored lower than white participants for knowledge and self-efficacy. In other words, although men and women benefitted similarly from the training, women’s scores started lower and remained lower after training This pattern could be explained by an underestimation of the difficulty of taking action in response to incidents of harassment and assault by individuals who have little or no experience with the issue (i.e. the Dunning–Kruger effect)51. While women, particularly women of color, are most likely to experience and report sexual harassment52,53, men have been shown to be more likely to characterize harassment incorrectly54 and less quickly55, less likely to believe harassment complaints, and more likely to respond poorly to harassment prevention training programs56. Just like other social problems related to stigma and discrimination, people who are not directly impacted may be less likely to experience or see the problem, and therefore less likely to grasp the difficulty of responding to it. In addition, the lower reported scores for women and URM individuals may be connected to greater levels of mistrust in their institutions, for example if they have previously experienced or witnessed a negative outcome as a result of the reporting process57.

These gender and race patterns can be viewed within the broader literature that connects sexual or gender-based harassment to larger and intersectional issues of power, wherein identity-based harassment is used as an expression of dominance and a tool to enforce or protect an individual’s privileged sex-, race- or other identity-based social status within socially stratified and inequal systems6,50,58,59,60. These results point to a broader framing of harassment training efforts not only as a method to prevent incidents, but also as one tool in a more expansive effort to dismantle systemic power imbalances and pursue equity and justice in science and academia.

Surprisingly, personal norms and past behaviour did not mediate the impact of the training on behavioural intention, suggesting that the training is not only impactful for people already interested in or motivated by the topic. This is promising, given our demographic results showing lower scores reported for marginalized groups, and suggests that this kind of training can be helpful to people at different stages of knowledge about and experience with harassment.

However, our sample sizes, particularly for the number of URM respondents (n = 55) compared to 110 white respondents in our within-subjects sample, limits the generalizability of these results, and may be reflective of broad underrepresentation of marginalized racial and ethnic groups in governmental natural resources management as a whole61,62,63. The fact that no survey respondents identified as non-binary, transgender, or any other gender identity besides man or woman limits our understanding of how these trainings affect gender minorities, especially given recent evidence about the heightened risk of harassment for this population in the field and in general64,65,66. We also did not collect data on sexual orientation or disability66,67, gaps that could be filled by follow-up studies. Still, these findings indicate that to the extent that the training worked, it worked equally for these race and gender groups, and that the patterns observed for race and gender exist within participants regardless of training. Future training efforts could consider developing customized trainings that meet the needs and existing knowledge of different groups. It is also possible that women and URM respondents are underreporting their capabilities, and men and white respondents are over-reporting their capabilities. In situations where this is the case, future trainings could integrate our results within training content to support more accurate self-appraisals.

Implications for organizational initiatives

This study suggests that large state agencies (particularly those with high need for fieldwork activities) are useful platforms for deploying and testing harassment prevention programs, given their access to large numbers of field-going staff. There is building evidence that the natural resource fields present obstacles to people with marginalized identities, and agencies will continue to face greater pressure to implement solutions68,69,70. This kind of interactive, peer-led training program could be replicated, tailored, and improved at other large state and federal agencies.

Our results also suggest limitations related to incident reporting that can be addressed by institutions. While we found that the training had positive and sustained immediate effects on reporting self-efficacy and intention, only three reports related to sexual harassment or assault were made to CDFW during the study period. This prevented rigorous analysis of actual reporting rates. Further, the low number of reports points to the inherent difficulty in using reporting rates as indicators of actual incident rates and responses45,56,71. The low number of reports is likely associated with the harmful effects of the primary reporting systems used by most large academic and research institutions (including CDFW)—in particular, universal mandatory reporting, which requires employees to report any incident of sexual harassment or misconduct they learn about to officials, even against a victim’s consent72. Mandatory reporting without consent has been demonstrated to discourage survivors from disclosing incidents and conflict with survivors’ healing processes73, and it is unlikely to result in justice in the form of sanctions for the perpetrator74. Even worse, mandatory reporting can lead to retaliatory behaviour from alleged perpetrators75,76 and other coworkers in as many as 63% of workplace sexual harassment cases76. Alternative reporting systems to mandatory reporting have been suggested, including a shift toward “mandatory supporting” that prioritize confidential reporting options, require consent for official reports, and provide trauma-informed training so that employees can support survivors who do disclose72. These changes will be particularly important in the high-risk setting of fieldwork and at government agencies where reporting rates, as this study demonstrates, can be extremely low.

Efforts to improve the organizational climate of fieldwork settings, like that tested by this study, offer an alternative to reporting-focused initiatives with the goal of legal compliance rather than the elimination of harassment. However, it is important to note than real workplace climate improvement is not achievable by individual participation in trainings alone, and requires substantial and sustained institutional commitment. Agencies and other large institutions could fulfill their responsibilities towards staff safety by developing and funding efforts that reprimand and remove offenders and build effective, trauma-informed reporting systems that actually support and protect victims. Without institutional commitment, training programs that only target employees and do not tackle larger institutional barriers to inclusion risk sending mixed messages and can foist the burden of systemic change on individuals with the least power.

Study limitations and future research

In addition to sample size and demographic limitations described above, working within a large governmental agency provided both challenges and opportunities for future research and recommendations. Obtaining control data (participants who completed surveys without being trained) was challenging due to the difficulty of incentivizing Time 1 survey completion for participants who were not scheduled to take the training until months later. One additional possibility is that the receipt of the Time 1 survey prompted individuals to sign up for the training itself, introducing potential biases in the order that participants completed trainings. As a result, our final sample sizes were 50% of desired sizes for the treatment group and 10% for the control group, which limited the power of our treatment–control analyses to detect small and moderate effects. We also struggled to retain participants for longitudinal analyses (n = 64 out of 925 staff trained). Future research should strive to incorporate rigorous control groups and improve survey recruitment processes to allow for deeper demographic analyses. One way to achieve this might be to use a control intervention, like a “traditional” online training, that prompts participants to take the first survey wave, which could increase the control group survey sample. Also, providing incentives for survey completion like small prizes could help increase completion rates.

Our attempt to measure self-reported behaviour through a question about frequency (with options from “once a day or more” to “never”, Table S1) likely led to floor effects, with nearly all respondents choosing the lowest possible option, i.e. that they never did the behaviour while working in the field. This made it challenging to obtain robust responses related to self-reported behaviours. This challenge is inherent to sexual harassment and assault prevention research, as bystanders might only be called to take key actions once a field season—for instance, in the creation of a field safety plan. Future survey instruments could focus on more frequent “lower-level” behaviours that might be precursors to more direct forms of sexual harassment and assault prevention, such as actions to create inclusive organizational fieldwork climates. For example, survey questions that ask whether participants engaged in a community agreement exercise, or intervened to address microaggressions or expressions of implicit bias could be additional indicators of organizational climate. Recognizing the limitations to measuring behaviours through surveys, a second approach might be to integrate data collection about prevention behaviours into pre-existing organizational systems, such as by refining performance evaluations to measure and reward behaviours that promote inclusive cultures.

Interventions that help participants identify and diagnose a spectrum of exclusionary behaviours in others (e.g., self-protectionism, or defensive behaviour to protect one's perceived advantages) can also help elucidate how to take action toward building more holistic, inclusionary behaviours that promote belonging and psychological safety in fieldwork77.

Conclusion

This study provides support for the utility of an interactive, scenario-based training intervention for field-based staff and scientists at a large state agency, and can be a model for other large institutions looking to move beyond click-through online modules toward more interactive modes of harassment prevention training. Training should not be thought of as a panacea, but rather, especially in light of our results, as a way to open the door for larger conversations about organizational climate and inclusive settings, especially in the high-risk setting of scientific fieldwork.