In an open communication climate, individuals expect that their teammates will be receptive and responsive to diverse and even dissenting views (Schiller & Cui, 2010). Open communication is a strong predictor of effective team performance (Bui et al., 2019). In fact, catastrophic team failures (e.g., Space Shuttle Challenger) are often attributed, at least in part, to a lack of open communication about employees’ concerns. It has been argued that structured interventions could be strategically designed to foster a meta-norm in which team members feel free to express their diverse and dissenting opinions and to reflect deeply on potential risks as well as opportunities (Schippers et al., 2014). One such intervention is a team debriefing.

Team debriefings are meetings in which team members gather for the purpose of team reflexivity (Lines et al., 2021). Team reflexivity is an “explicit information-processing activity” (Schippers et al., 2014 p. 735), whereby teams “overtly reflect upon, and communicate about the group’s objectives, strategies (e.g., decision making), and processes (e.g., communication), and adapt them to current or anticipated circumstances” (West et al., 1997, p. 296). Team reflexivity has been positively linked to adaptation and performance whether it takes place during or between performance episodes (Abrantes et al., 2021). Team debriefings take place during transitions between performance episodes. Specifically, a facilitator (either internal or external to the team) leads teams “through a series of questions that allow participants to reflect on a recent experience, construct meaning from their actions, and uncover lessons learned” (Tannenbaum & Cerasoli, 2013, p. 231). Most organizations employing teams conduct debriefings of some sort, although the structure, facilitation, and regularity of these varies substantially (e.g., Allen et al., 2018; Stoto et al., 2019). Also referred to as after-action-reviews (e.g., Ellis & Davidi, 2005), post-mortems (e.g., Kasi et al., 2008), hot washes (e.g., Sinclair et al., 2012), or huddles (e.g., Reiter-Palmon et al., 2015) depending on the industry, team debriefings may occur after a significant operational incident, at the conclusion of a long-term project, or following regular work shifts. Moreover, they are believed to be a critical component of simulation-based training (Shinnick et al., 2011). Team debriefings have long been a staple in high reliability organizations (HROs), those that operate in environments where errors can be fatal (Weick et al., 2005), yet they are increasingly used in non-HRO environments such as public health (e.g., Stoto et al., 2019), information technology (e.g., Kasi et al., 2008), manufacturing (e.g., Chen et al., 2018), and business school education (e.g., Eddy et al., 2013).

Meta-analytic research on debriefings indicates that they are an effective method for improving team performance (Keiser & Arthur, 2021, 2022; Tannenbaum & Cerasoli, 2013). However, team debriefing methods vary substantially between and within organizations and very few studies have directly compared different methods in the same organizational setting (e.g., Eddy et al., 2013; Smith-Jentsch et al., 2008). Moreover, evidence regarding differences in the quality of team reflexivity that takes place during different debriefing methods is seldom collected and is poorly reported, let alone compared (Lines et al., 2021; Tannenbaum & Cerasoli, 2013). The paucity of primary studies in this regard has prevented authors of meta-analyses from providing organizations with guidance to help them choose between alternative methods to meet their unique objectives.

In this regard, theories of team reflexivity differentiate qualitatively different levels, ranging from deep to surface (Schippers et al., 2007; West, 2000). Transfer appropriate processing theory (Craik & Lockhart, 1972) would suggest that surface (shallow) reflexivity should be more effective and efficient for teams that work in relatively routine task environments, whereas deeper reflexivity should be most effective in rapidly changing environments (Schippers et al., 2014). These propositions have largely gone untested to date since empirical research on team reflexivity has almost exclusively employed measures of frequency rather than depth (Konradt et al., 2016). In a notable exception, Schippers et al. (2007) found support for a two-factor scale differentiating shallow and moderate/deep levels of reflexivity depth. Shallow team reflexivity was described as “evaluating finished business” and “thinking about issues related closely to the task at hand.” By contrast, deeper team reflexivity was described as “double-loop” learning in which “communication patterns” are reflected upon at a “meta-level,” and members question “the norms and values of the team or organization” (pp. 206). While this research begins to specify the behaviors that differentiate shallow from moderate/deep team reflexivity, it remains unclear how organizations can best structure interventions, such as a team debriefing, to target one level versus the other. The present research was designed to address this need.

Motivated Information Processing in Groups Theory and the Depth of Team Reflexivity

The Motivated Information Processing in Groups model (MIP-G) proposes that team members’ willingness to expend cognitive effort (De Dreu et al., 2008) determines whether they engage in shallow or deep information processing. In this regard, epistemic motivation (EM) has been defined as the desire and willingness to “expend effort to achieve a thorough, rich and accurate understanding of the world, including the group task or decision problem at hand” (De Dreu et al., 2008, p.23). Those lower in EM are said to have a greater need for cognitive closure (NfCC). The NfCC is described as the desire to seek “a firm answer to a question, any firm answer as compared with confusion and/or ambiguity,” (Kruglanski, 2004, p. 6). Both the NfCC and EM have been measured as traits and have been manipulated as states. Notably, results from prior research have consistently shown that EM/NfCC “state” effects mirror “trait” effects (e.g., Brizi et al., 2016; Bukowski et al., 2013; Chirumbolo et al., 2004a, 2004b; Choi et al., 2008; De Grada et al., 1999; Di Santo et al., 2020; Otto et al., 2016; Pierro et al., 2003). Moreover, the same environmental conditions (e.g., time pressure) that strengthen NfCC tend to weaken EM, and vice versa.

Individuals with a higher NfCC (lower EM) experience more negative emotions when confronted with complex information (Amit & Sagiv, 2013) or inconsistent feedback (Di Santo et al., 2020) and tend to “seize” and “freeze” on solutions to problems that are presented to them early in the decision-making process and are consistent with their pre-existing knowledge or biases (Bukowski et al., 2013; Kruglanski & Webster, 1991; Roets et al., 2006). Interpersonally, those with higher NfCC are less prone to take the perspective of others (Sparkman and Blanchar, 2017) and are more likely to exert pressure on their teammates to conform (Chirumbolo et al., 2004b; De Grada et al., 1999).

In addition to the distinction between EM and NfCC, MIP-G differentiates between proself and prosocial motivation. Team members’ preferences in this regard are expected to dictate the kind of information they are motivated to process (e.g., De Dreu et al., 2008). Prosocial motivation is expected to focus efforts on processing and communicating information that furthers shared team goals whereas proself motivation is expected to focus efforts on enhancing individualistic goals such as one’s status and power within a team (Bechtoldt et al., 2010). Those perceiving competitive interdependence (a proself motivator) report learning less from a collaborative partner (Tjosvold et al., 2005), whereas teams that perceive greater cooperative interdependence (a prosocial motivator) learn more from their reflexivity (De Dreu, 2007).

Theory and research on the MIP-G model can be used to posit which instructional features should facilitate and which should hinder deep reflexivity during team debriefings. The present research contributes to the science and practice of team debriefing in this regard. Specifically, we compared two debriefing methods: chronological and Team Dimensional Training (TDT). The chronological method derives from an accident investigation where actions and events are reconstructed in the order they unfolded, with the goal of isolating the root cause of a known performance failure (Hofmann & Stetzer, 1998). Consistent with descriptions of shallow team reflexivity, chronological debriefings focus on “evaluating finished business,” and prioritize discussion of issues “close to the task at hand.” By contrast, the explicit goal of TDT debriefings is for teams to learn to critique their own performance using generalizable principles of teamwork (Smith-Jentsch, 2018; Smith-Jentsch et al., 1998, 2008). This objective is consistent with the definition of deeper reflexivity as involving “double-loop learning” in which “communication patterns” are reflected upon at a “meta-level” (Schippers et al., 2007, pp. 206). TDT has been used in many different organizations and industries (e.g., military, healthcare, business school education) (Smith-Jentsch, 2018) but is less commonly used than chronological debriefings. Previous research has shown that TDT debriefings result in higher quality mental models and greater adaptive transfer than do chronological debriefings (Smith-Jentsch et al., 2008). The present study extends these findings by directly comparing what teams do differently during the two debriefing methods; with the expectation that TDT debriefings motivate deeper reflexivity. Such information would assist organizations in determining when each method is most appropriate given the information processing demands of their intended transfer environment.

A second way in which the present research contributes to science and practice is by comparing chronological and TDT debriefings with respect to their impact on participants’ perceptions of open communication climate. Episodic models of team performance (Marks et al., 2001), and of team reflexivity in particular (Konradt et al., 2016), suggest that affective states emerging from team debriefings will become inputs to future transition and action episodes. Open communication climate has been linked to the transfer of knowledge within (Hofhuis et al., 2016) and between teams (Mueller, 2014). Moreover, employees report a greater willingness to report adverse events (Yu et al., 2022), and to recognize another teammate as being the cause of an accident (Hofmann & Stetzer, 1998) when they perceive an open communication climate. Conversely, employees who perceive a lack of openness report withholding their views and concerns to a greater extent (Knoll et al., 2021). Although team debriefings (compared no debriefings) have been shown to facilitate a more open communication climate (Jarrett et al., 2016; Villado & Arthur, 2013), prior authors have cautioned that experiences which reinforce expectations for shallow information processing could have detrimental effects (Konradt et al., 2016). In this regard, we could find no quantitative studies that compared the effects of alternative debriefing methods on perceptions of open communication climate. The present research provides evidence to guide organizations in selecting an optimal method for this purpose.

A final contribution of the present research is that we explored three contextual variables (i.e., prior performance, task experience, team size) as potential controls and potential moderators of the impact of debriefing method. These variables are commonly examined in research on team reflexivity in general and team debriefings specifically. However, they have not been examined as moderators of the relative superiority of one structured method over another, or as determinants of reflexivity depth. A better understanding of the potential boundary conditions associated with these variables is necessary for practitioners to assess the generalizability of our findings to their unique contexts.

MIP-G and Debriefing Method

Results from laboratory tests of the MIP-G model suggest two instructional features that should influence the depth of team reflexivity during debriefings. The first is the relative emphasis placed on outcome versus process accountability. Outcome accountability exists when individuals are evaluated based on the outcome of their decision-making, whereas process accountability is “the expectation of having to justify to others the decision process used, regardless of the outcome of the decision” (Peytcheva et al., 2014 pp. 51). Process accountability has been found to strengthen self-reported EM relative to outcome accountability (e.g., de Langhe et al., 2011) or to no accountability (De Dreu et al., 2006; Scholten et al., 2007).

A second instructional feature that should impact the depth of team reflexivity is the degree to which cooperative or competitive interdependence is made salient. Cooperative interdependence exists when individuals believe that their own objectives/rewards are compatible with and are facilitated by their teammates’ objectives/rewards, whereas competitive interdependence exists when individuals perceive that their personal objectives/rewards (or punishments) and those of their teammates are inversely related (De Dreu, 2007). Individuals for whom cooperative interdependence is made salient report stronger prosocial motivation, whereas those for whom competitive interdependence is made salient report stronger proself motivation (De Dreu et al., 2006).

In sum, the MIP-G model and prior research supporting it suggest that a debriefing method which both emphasizes process accountability and makes cooperative interdependence salient should motivate deeper reflexivity than should a debriefing method that emphasizes outcome accountability and makes competitive interdependence salient. The debriefing methods compared in our quasi-experiment differed in both these respects.

Chronological Debriefings

Outcome Accountability

Chronological debriefings place a premium on reducing uncertainty regarding the root cause of a team performance outcome. Prior research has found that when uncertainty is made salient, individuals behave in a manner consistent with those high in NfCC (Brizi et al., 2016). Thus, it is not surprising that in chronological debriefings, teams selectively attend to those actions that are believed to have been consequential to the outcome of an episode in question.

Competitive Interdependence

In chronological debriefings, facilitators seek to gain consensus among members as to the sequence of events that unfolded. Prior research has shown that when a confederate teammate states that he/she wishes to “smooth over differences” in opinion, this weakens participants’ perception of cooperative interdependence (Tjosvold et al., 2005). Moreover, when a confederate teammate attempts to place blame on another, this strengthens the perception of competitive interdependence (Tjosvold et al., 2005). Deriving from the tradition of an accident investigation, facilitators of chronological debriefings are likely to fall into a similar “blame trap” whereby fault is placed on the person most proximal to the outcome (Reason, 1994).

Team Dimensional Training (TDT) Debriefings

TDT debriefings (Smith-Jentsch, 2018; Smith-Jentsch et al., 1998, 2008) would be classified as highly structured both in terms of content and procedure using the criteria employed in prior meta-analyses (i.e., exact questions and procedures scripted; Keiser & Arthur, 2021; Tannenbaum & Cerasoli, 2013). Specifically, instructions and questions are intentionally designed to communicate process accountability and cooperative interdependence among participants.

Process Accountability

Holding individuals to principles-based standards has been shown to strengthen their beliefs that they will be held accountable for the processes they used to make decisions rather than for outcomes of those decisions (Peytcheva et al., 2014). At the start of every TDT debriefing, teams are informed that they will evaluate their own performance against 4 principles of effective teamwork.

“The event you just performed gave you an opportunity to practice working on four teamwork dimensions: Information Exchange, Communication Delivery, Supporting Behavior, and Leadership/ Followership. Now, you have an opportunity to critique your own performance on these four dimensions. I will facilitate this process of Guided Team Self-Correction. The way this works is that I’m going to ask the team for examples of each dimension.”

Cooperative Interdependence

Instructions which emphasize the value of openness to opposing views, and problem solving without placing blame, have been shown to strengthen team members’ perceptions of cooperative interdependence (Tjosvold et al., 2005). The following instructions read by facilitators at the start of every TDT debriefing are expected to do the same.

“For this [debriefing] to be effective, all team members need to contribute to the process of identifying problems and coming up with solutions to those problems. So, feel free to make comments, ask questions, and offer suggestions on how you can improve on subsequent performance events.”

Given the documented impacts of process accountability and cooperative interdependence in prior research, our overarching proposition was that TDT debriefings motivate deeper team reflexivity and lead individual participants to perceive a more open communication climate than do chronological debriefings (see Fig. 1).

Fig. 1
figure 1

Conceptual model

Controlling for Team Context

The present research was conducted in an operational training environment at NASA’s Johnson Space Center. As such, we statistically controlled a number of contextual variables that were not our focus but had the potential to threaten the validity of our results. It has been argued that researchers should provide a strong theoretical rationale for including control variables and that these variables should be included in hypothesis statements and models (Becker, 2005). Toward this end, the following sections describe the practical and theoretical rationale for three contextual variables that were included in the present research.

Prior Performance

It has been demonstrated that teams engage in quantitatively greater reflexivity following poor performance than they do following effective performance (Li et al., 2021). It is unclear, however, whether they also engage in qualitatively deeper reflexivity. If so, this may explain why the quantity of reflexivity is more strongly related to performance improvement following poor performance (Schippers et al., 2013). Therefore, we deemed it important to statistically control teams’ performance in the action episode (simulation exercise) being debriefed when testing the impact of debriefing method on our objective indicators of reflexivity depth.

Task Experience

Keiser and Arthur (2021) noted that few studies have investigated debriefings/AARs of teams that are highly experienced in their tasks. These authors posited that teams with greater task experience may be more capable of identifying and solving their own problems than those with lesser task experience. This could potentially strengthen, weaken, or reverse the relative benefit of one debriefing method over another. In the present study, we had the opportunity to compare the impact of the same two debriefing methods on teams at two levels of task experience. As such, we examined task experience as a potential control or moderating variable.

Team Size

Meta-analyses have found no evidence to suggest that team size moderates the effects of reflexivity interventions in general (Lines et al., 2021), or team debriefings specifically (Keiser & Arthur, 2021; Tannenbaum & Cerasoli, 2013). However, this prior research has not examined the effect of team size on the depth of reflexivity that takes place during team debriefings. Moreover, it has been noted that team size is often higher and more variable in organizational settings than it has been in prior studies of reflexivity and that research conducted in such settings may find team size to be a significant predictor. In the present study, team size varied from 5 to 26. As such, we included team size as a potential control variable or moderator of the relationship between debriefing method and reflexivity depth.

Debriefing Method and Reflexivity Depth

As we have noted previously, few prior studies have sought to measure team reflexivity depth, and those which have employed self-report scales (Schippers et al., 2007). In the following sections, we detail theoretical arguments in support of 5 objective indicators of deep team reflexivity. We expected these to show meaningful correlations with one another and to be impacted by debriefing method both individually and as a set.

Debrief Duration

In the absence of time limits or requirements, debrief duration can be conceived of as an indicator of teams’ motivation to engage in deep reflexivity. The MIP-G model suggests that those with a stronger EM are more willing to persist longer in an information processing task for the purpose of thoroughly understanding the world around them (De Dreu et al., 2008). Consistent with this notion, process accountability has been shown to increase team members’ EM and to lower the extent to which they feel that the information they hold individually is “sufficient” to make a decision prior to team discussion (Scholten et al., 2007). Also consistent with the MIP-G model, team discussions tend to last longer when members’ cooperative interdependence is made salient (Super et al., 2016). Conversely, when NfCC is induced, teams spend less time brainstorming than they do when NfCC is not induced (Chirumbolo et al., 2005). It follows that our first hypothesis stated:

  • Hypothesis 1: After controlling for a team’s prior performance errors, task experience, and size, teams will persist longer in TDT debriefings than they will in chronological debriefings.

Discussion Pace

Longer response delays have been associated with more complex information processing (Beller et al., 2022) and deeper learning strategies, whereas more rapid responding is associated with heuristic learning strategies (Farashahi et al., 2018) and decision errors (Calvillo, 2013). Moreover, respondents pause longer before providing disconfirming answers to questions than they do when providing confirmatory responses (Stivers et al., 2009). Therefore, we reasoned that less rapidly paced debriefing discussions would be an indicator of deeper team reflexivity.

Individuals with a stronger situational NfCC report less engagement in deep-learning strategies (Harlow et al., 2011), exert greater pressure on their teammates to conform (Chirumbolo et al., 2004b; De Grada et al., 1999), and exhibit shorter response delays (e.g., Roets et al., 2006). Moreover, instructions emphasizing the value of achieving consensus have been shown to reduce team members’ expression of disagreement (Postmes et al., 2001). Conversely, process accountability leads individuals to adopt more effortful cognitive strategies (e.g., de Langhe et al., 2011) and to seek more disconfirming evidence (Peytcheva et al., 2014) and these effects are mediated by EM. As such, our second hypothesis stated:

  • Hypothesis 2: After controlling for teams’ prior team performance, task experience, and size, the pace of discussions during TDT debriefings will be slower than will be the pace of discussions during chronological debriefings.

Team Self-Correction

Challenging team norms and values is a key differentiating characteristic of deep team reflexivity (Schippers et al., 2007). As such, we considered de-centralized problem identification (i.e., team self-correction) during a debrief to be an objective indicator of reflexivity depth. Those higher in NfCC prefer centralized leadership (Chirumbolo et al., 2004a) and report a greater willingness to comply with authoritarian leadership tactics (Bélanger et al., 2015) presumably because doing so affords more rapid consensus on issues. Moreover, laboratory research has found that when instructions emphasize the importance of rapidly achieving closure, individuals are more willing to allow a credible expert to make decisions on their behalf (Otto et al., 2016). Conversely, instructions emphasizing the importance of fully utilizing team members’ diverse perspectives have been found to decrease the centralization of a team discussion around the leader (Tost et al., 2013). In addition, process accountability facilitates adherence to behavioral display rules (Tunguz & Carnevale, 2011), and participants in TDT debriefings are explicitly instructed to critique their own performance. It follows that, in TDT debriefings, team members should rely less on their leader/facilitator to determine which errors are “discussion worthy” than they do during chronological debriefings.

  • Hypothesis 3: After controlling for teams’ prior performance, task experience, and size, problem identification will be less centralized around the leader/facilitator in TDT debriefings than it will be in chronological debriefings.

Breadth of Process Problems Raised for Discussion

The discussion of “process patterns” is a defining characteristic of moderate/deep team reflexivity (Schippers et al., 2007). In the context of a team debriefing, this requires that team members are motivated to share a broad range of behavioral observations. In this regard, members of teams made accountable for their processes have been shown to generate a greater number of unique ideas, particularly when accompanied by cooperative interdependence (Bechtoldt et al., 2010). Conversely, consensus norms have been shown to strengthen the well-documented tendency of teams to prioritize discussion of members’ common (vice unique) knowledge (Postmes et al., 2001).

Findings from prior research suggest that the NfCC coupled with proself motivation should particularly inhibit team members from discussing “potential problems” that have not yet become consequential. Specifically, those higher in NfCC report less motivation to exert effort when the reward for doing so is delayed or uncertain (Schumpe et al., 2017), and those with a greater fear of being blamed are less likely to report their “near misses” (Morris & Moore, 2000). Cooperative interdependence, however, has been linked to greater psychological safety (Chen & Tjosvold, 2012), and simply listening to a counterfactual story has been shown to prime one’s own counterfactual thinking (about what ifs?) (Kray & Galinsky, 2003; Liljenquist et al., 2004). In this way, team members are likely to stimulate one another to uncover “potential problems” during TDT debriefings that would otherwise be ignored during chronological ones.

  • Hypothesis 4: After controlling for teams’ prior team performance, task experience, and size, members will (collectively) raise a greater breadth of problems for discussion during TDT debriefings than they will during chronological debriefings.

Distribution of Speech Turns

In a team debriefing, fully utilizing all available sources of information requires that team members are willing to cooperatively share “floor time.” However, some debrief participants intentionally remain silent unless forced to speak (Andersen et al., 2018), and facilitators often disproportionately call upon team members with greater seniority and those most proximal to key events (Coggins et al., 2022). The distribution of speech turns has been found to be a better predictor of team performance on a variety of tasks than members’ mean or maximum level of intelligence (Engle et al., 2014; Wooley et al., 2010). Presumably, teams that share the floor more cooperatively capitalize better on members’ uniquely held knowledge and information.

When NfCC is situationally induced, teams exhibit less even patterns of turn taking with the most socially dominant members both speaking and being spoken to a disproportionate number of times (De Grada et al., 1999; Pierro et al., 2003). By contrast, teams exhibit a more even-handed treatment of members’ uniquely held information when process accountability is combined with cooperative interdependence (Super et al., 2016).

  • Hypothesis 5: After controlling teams’ prior performance, task experience, and size, team members’ turns to speak will be more evenly distributed during TDT debriefings than they will be during chronological debriefings.

Perceptions of Open Communication Climate

Psychological climate is considered an individual-level construct because individuals perceive cues in the environment through the lens of their unique desires, prior experiences, and views of the world (James et al., 2008). Individuals form perceptions of climate by observing behavior that is encouraged, modeled, and rewarded (James et al., 1977). More specifically, theory and research on the epistemic tuning hypothesis (Lunn et al., 2007) indicate that individuals are motivated to adopt the epistemic values of those in their immediate interpersonal context. As such, participants in a team debriefing should be motivated to attend to their teammates’ level of engagement (shallow-deep) in team reflexivity, and these observations should affect their perceptions of open communication climate.

Open communication has been defined as the “propensity to tolerate, encourage, and engage in open, frank expression of views” (Amason & Sapienza, 1997). Prior laboratory research has shown that members of teams higher in NfCC exert greater pressure on one another to conform (Chirumbolo et al., 2004b; De Grada et al., 1999), and that instructions emphasizing consensus inhibit the expression of disagreement (Postmes et al., 2001). Conversely, instructions emphasizing cooperative interdependence and the value of team members’ unique perspectives strengthen perceptions that open communication is accepted, expected, and psychologically safe (Chen & Tjosvold, 2012; Tjosvold et al., 2005; Tost et al., 2013). It follows that our final hypothesis stated:

  • Hypothesis 6: After controlling for teams’ prior performance, task experience, and size, the climate in TDT debriefings will be perceived to be more open than will the communication climate in chronological debriefings.

Method

Study Design

The present research employed a quasi-experimental design. A quasi-experiment is conducted in a “natural social setting” in which the researcher introduces a manipulation but “lacks the full control over the scheduling of experimental stimuli (the when, and to whom of exposure and the ability to randomize exposures)” making a true experiment impossible (Campbell, & Stanley, 2015, pp.34). The present quasi-experiment was conducted in a very fluid operational environment in which individuals participated in simulation-based training and evaluation together with different groups of ad hoc teammates on a regular basis (NASA, Johnson Space Center). Accordingly, while any two individuals (i.e., flight controllers) are likely to train and work with one another repeatedly, an entire ad hoc team (composed of up to 26 members in the present research) may or may not ever train (or work) together more than once.

Participants

Teams

A total of 69 ad hoc teams participated in the present research. During the period of data collection, the first 35 ad hoc teams scheduled for training participated in debriefings conducted according to the existing practice (i.e., chronological). Next, Flight Directors (internal team leaders) were trained to use the TDT method of debriefing. The following 34 ad hoc teams scheduled for training participated in TDT debriefings.

Team Members

In total, 76 individuals provided ratings of open communication climate as part of this research. Forty-one of these individuals participated in only one of the ad hoc teams. The remaining individuals participated in more than one ad hoc team debriefing.

Facilitators

While TDT debriefings can be either self-led (peer-to-peer) or expert-led (facilitated by an internal or external leader), in the present study, they were led by the internal team leader. Sixteen Flight Directors participated as facilitators in the present research. Of these, eight were observed facilitating only chronological debriefings, five were observed facilitating only TDT debriefings, and three were observed facilitating at least one of each type of debriefing.

Procedure

Facilitator Training

Flight Directors who facilitated TDT debriefings received 1 day of formal classroom instruction. This lasted 8 h and incorporated best practices for leadership training as identified in a recent meta-analysis (Lacerenza et al., 2017). Specifically, the content was based on a training needs analysis conducted jointly by a group of scientists and practitioners (Smith-Jentsch et al., 2015), and the process included information, demonstration, practice, and feedback. One to 2 weeks later, Flight Directors spent half a day practicing using TDT to debrief a real team in the identical environment where the study was to be conducted. The first author listened to these debriefings live and provided detailed feedback to the facilitators immediately following their debriefing.

Simulation Environment

Each team participated in a 3-h simulation exercise that was conducted in a fully equipped mission control room at Johnson Space Center in Houston Texas. Verbal communications during the simulation primarily took place via an audio-only channel. Infrequently members would walk from their consoles to another console and speak to one another face-to-face. Debriefings took place immediately following the simulation.

Debriefing Method

There was no time limit for debriefings. Immediately following each debriefing, participants either left for lunch or left work for home. Team members remained at their consoles and participated in the debriefings via the identical audio network that they had in the simulation. Given their limited visual contact, participants stated their own position title and the position title of the individual they directed a question or comment to at the start of each speech turn. This was standard practice and allowed us to differentiate speakers for the purpose of transcription.

Chronological Debriefings

Review of audio recordings and transcripts confirmed that facilitators began each chronological debriefing by summarizing what took place during the prior action episode, highlighting what they considered to be critical junctures in the timeline. Next, they facilitated a discussion whereby the teams’ actions and the outcomes of those actions were reconstructed/confirmed in the chronological order that they unfolded. As is common in chronological debriefings, facilitators directed the majority (but not all) of their questions to specific team members (addressed by functional team role).

TDT Debriefings

Our review of audio recordings and transcripts revealed that facilitators also began TDT debriefings by summarizing what took place during the prior action episode highlighting what they considered to be critical junctures in the timeline. However, following this summary, facilitators read the scripted TDT instructions in which process accountability and cooperative interdependence were communicated. Finally, facilitators asked teams a series of scripted questions about their performance in the prior episode, one teamwork category at a time. Questions within the debriefing guide were read largely word for word. The identical guide was used for both low- and high-task experience teams. All scripted questions in the TDT debriefing guide are addressed to the team collectively rather than to individual team members, (e.g., Looking back, can any of you recall an instance when…). A review of the transcripts indicated that TDT facilitators followed this guidance the first time they asked each question from the guide. However, they often directed additional follow-up questions to specific individuals (by team role). For each category of teamwork, teams were asked first to recall positive examples and then to recall negative examples (e.g., …. Can you recall a time when someone provided you with useful backup? Can you recall a time when you needed backup but did not receive it?). After each teamwork problem that was shared by a team member, the facilitator asked the team about factual and counterfactual results (e.g., How did this, or could this have, led to errors or confusion?), as well as possible solutions for the problem (e.g., What was the intended goal, direction, or priority, and who could have made this clearer? Who could have asked for clarification?). Multiple questions are provided for each of the four teamwork processes in the TDT debriefing guide (i.e., information exchange, communication delivery, supporting behavior, leadership/followership). A review of the transcripts indicated that all facilitators asked a minimum of one negative and one positive question about each of the categories, and that all teams provided at least one positive and one negative example of each teamwork dimension. The number of positive examples generated by a team during their TDT debriefing was highly correlated with the number of negative examples generated (r = 0.51, p < 0.01). Moreover, the number of examples (positive and negative combined) generated by teams was consistent across the four teamwork dimensions (coefficient alpha = 0.68).

Measures

All debriefings were audio recorded, transcribed, and coded to obtain our 5 indices of reflexivity depth. Data on who participated in which debriefing, as well as their task experience (i.e., operator or specialist), the team size, and the number of errors observed by instructors during each simulation were obtained from the organization’s training logs.

Control Variables/Potential Moderators

Team Task Experience

Teams at two levels of experience participated in our quasi-experiment: 33 operator teams (lower experience) and 36 specialist teams (higher experience). Attempts to counterbalance within our two quasi-experimental conditions were largely successful with 16 operator teams and 19 specialist teams participating in chronological debriefings, and 17 operator teams and 17 specialist teams participating in TDT debriefings. Team experience was coded as a dichotomous variable (operators = 0, specialists = 1) for analysis.

Team Size

Team size ranged from 5 to 26 members with a mean of 14.12, sd = 4.86. The modal team size was 12 (11 teams) and the median was 14. Results from an independent t test revealed that the average team size was larger in the chronological debriefing condition than it was in the TDT condition (15.83 versus 12.56), t(67) = 2.95, p < 0.01, and that teams of specialists were larger than teams of operators (16.11 versus 12.16), t(67) =  − 3.68, p < 0.01. We examined team size first as a continuous predictor variable and separately as a dichotomous variable based on a median split of teams as 37 small (≤ 14) and 32 large (> 14) for the purpose of our MANOVA analysis.

Prior Performance Errors

External instructor/observers recorded the number of significant errors made by each team during their simulation exercises. These ranged from 0 to 7. For 14 of the teams, instructor/observers recorded no significant errors. The modal number of errors was 1 (17 teams), the mean was 2.01, sd = 1.70, and the median was 2.00. The mean number of instructor-/observer-rated errors did not differ significantly between teams that participated in chronological and TDT debriefings (2.00 versus 2.03), t(67) =  − 0.07, ns, nor did they differ between low experience (i.e., operators) and high experience (specialists) teams (2.30 and 1.75, respectively), t(67) =  − 1.36, ns. We examined prior performance errors first as a continuous predictor and separately as a dichotomous variable based on a median split (31 classified as having made few errors (0–1) and the remaining teams 38 having made a greater number of errors (> 1)) in our MANOVA test.

Indicators of Reflexivity Depth

Debrief Duration

The duration of debriefings was measured from audio recordings. Debrief duration ranged from 10.28 to 72.02 min, with a mean of 40.07 min, sd = 15.33 min.

Discussion Pace

Discussion pace was measured by dividing the total number of words spoken in a debriefing by the length of the debriefing in seconds. Scores on this variable ranged from 1.66 to 4.66 words per second with a mean of 2.79, sd = 0.39.

Breadth of Problems Identified

Two raters familiar with the team task were trained to assess the number of unique problems raised for discussion during each debriefing from transcripts. One of these raters assessed all debriefings, whereas the second assessed 33 of the transcripts as a means of determining inter-rater reliability. The correlation between the two raters’ assessments for the 33 debriefing transcripts that they both reviewed was 0.88. Assessments made by the first rater who evaluated all transcripts were used to test our hypotheses. Given that participants were not face-to-face during the debriefing, it was customary to begin each speech turn by stating one’s role on the team. This enabled us to determine whether a particular problem was raised for discussion by a team member (vice the facilitator). The number of problems raised for discussion by team members ranged from 0 to 18 with a mean of 4.23, sd = 4.67.

Team Self-Correction

The number of problems raised for discussion by facilitators ranged from 0 to 6, with a median of 1, a mode of 0 (22 debriefings), and a mean of 1.54, sd = 1.52. The number of problems raised for discussion by the facilitator was highly correlated with the number of problems identified by instructor/observers during the simulation (r = 0.89, p < 0.01). The number of errors facilitators raised for discussion did not differ between chronological and TDT debriefings (1.71 and 1.35, respectively), t(67) = 0.99, ns, nor did they differ between low- (operator) and high (specialist)-experience team debriefings (1.61 and 1.47, respectively), t(67) = 0.36, ns. The degree to which teams took responsibility for team self-correction was indexed by computing the percentage of the problems noted during a team debriefing that were first raised by a team member relative to the total number of problems raised for discussion. These values ranged from 00 to 1.00, with a mean of 0.55, sd = 0.42.

Distribution of Team Members’ Turns to Speak

The distribution of team members’ turns to speak was calculated in a manner consistent with prior laboratory research in which the NfCC was experimentally manipulated (e.g., Pierro et al., 2003). First, the number of turns to speak was summed for each member of the team from transcripts. Next, we computed the standard deviation of these speech turns. Finally, the standard deviation was divided by the total number of speech turns taken by team members collectively. In this way, higher scores correspond to more uneven participation in a debriefing, whereas lower scores represent more democratic patterns. Speech turn distribution ranged from 0.14 to 2.0, with a mean of 0.86, sd = 0.43.

Open Communication Climate

A total of 156 climate ratings were collected (75 for chronological debriefings and 81 for TDT debriefings). These were obtained from 76 flight controllers for whom we had complete data on all control variables. Forty-one of these individuals provided climate ratings for only 1 debriefing, 13 individuals provided ratings for 2 debriefings, 6 individuals provided ratings for 3 debriefings, 11 provided ratings for 4 debriefings, 3 provided ratings for 5 debriefings, and 2 provided ratings for 6 debriefings. The number of individuals providing climate ratings for the same ad hoc team debriefing ranged from 2 to 11. Climate ratings were made immediately following these debriefings using a single 6-point rating scale (“How would you describe the climate during the debriefing you just participated in?) anchored with the adjectives open (6) to closed (1). We obtained ratings from participants that ranged from 2 to 6, with a mean of 5.29, sd = 0.85.

Results

Analysis Plan

Given the theoretical argument that our 5 team-level DVs are objective indicators of deep team reflexivity, and the fact that they exhibited moderate to high intercorrelations, we began by using Multivariate Analysis of Variance (MANOVA) to test the impact of debriefing type (chronological or TDT) on these variables as a set to control for family-wise error. Prior performance and team size were entered as dichotomous variables based on median splits so that two-way interactions (exploratory) between debriefing method and the three control variables could be explored. Univariate ANOVAs were then used to examine the impact of debriefing method, the three contextual variables (again as dichotomous variables based on median splits), and two-way interactions between method and each contextual variable on our DVs individually; but only in cases where a given IV had exhibited a significant multivariate effect. Next, we conducted Hierarchical Linear Models (HLM) to determine whether the random factor of the facilitator conducting the debriefing explained significant variance in any of our DVs alone or in combination with debriefing method and our three contextual variables (as continuous predictors). Finally, HLM was used to test our last hypothesis at the individual level of analysis. In this analysis, we controlled the random factors of the participant (many of whom participated in multiple debriefings) and the ad hoc team, in addition to debriefing method and our three contextual variables (as continuous variables).

Relationships Between Team-Level Study Variables

Means, standard deviations, and correlations among our team-level variables can be found in Table 1. All bi-variate correlations among our 5 objective indicators of reflexivity depth were in the expected direction and all but 2 were statistically significant.

Table 1 Means, standard deviations, and correlations among team-level variables

Multivariate Analysis of Variance (MANOVA)

We conducted a Multivariate Analysis of Variance (MANOVA) to test the impact of debriefing method (i.e., chronological or TDT) on our set of 5 team-level indices of reflexivity depth. This was done first with and then without our 3 contextual variables (i.e., prior performance errors, task experience, team size) and three interaction terms to explore moderation (method x prior performance, method x task experience, method x team size). In this analysis, our 3 control variables were indexed as dichotomous based on a median split on each. This was done because of the assumption that covariates in MANCOVA do not interact with fixed-effect variables. However, we did run a MANCOVA (not included in Table 2) in which prior performance and team size were treated as continuous covariates (without interaction terms) and obtained the same results.

Table 2 Multivariate effects on the depth of team reflexivity

As shown in Table 2, debriefing method, task experience, and prior performance errors each exhibited a significant multivariate impact on the set of 5 DVs. Additionally, the multivariate test for the interaction of debriefing method and prior performance errors was significant. The following sections report on the univariate ANOVA tests for these IVs conducted on each DV individually.

Univariate Analysis of Variance (ANOVA)

Debrief Condition

Results from our univariate ANOVAs indicated that debriefing condition significantly impacted all 5 objective indicators of reflexivity depth in the hypothesized direction. As shown in Table 3, TDT debriefings were longer (F(1, 61) = 23.40, p < 0.001), slower (F(1, 61) = 12.00, p < 0.001), and covered a greater breadth of problems (F(1, 61) = 58.31, p < 0.001) than did chronological debriefings. Moreover, teams (collectively) identified a greater proportion of their own problems (relative to the total number identified) (F(1, 61) = 67.78, p < 0.001) and took more evenly distributed turns to speak in TDT debriefings than they did during chronological debriefings (F(1, 61) = 20.69, p < 0.001). These results provide strong support for Hypotheses 1–5.

Table 3 Univariate effects for debriefing method

Team Task Experience

Results from our Univariate ANOVA tests indicated that team task experience was significantly related to 4 of our 5 indicators of reflexivity depth (see Table 4). Specifically, teams with higher experience (i.e., specialists) persisted longer in their debriefings (F(1, 61) = 13.87, p < 0.001), took responsibility for a greater proportion of team self-correction (F(1, 61) = 4.46, p < 0.001) and identified a greater breadth of problems for discussion (F(1, 61) = 5.52, p < 0.05) than did teams with less experience (i.e., operators). However, more experienced teams exhibited less evenly distributed turn taking than did lower experience teams (F(1, 61) = 10.67, p < 0.005).

Table 4 Univariate effects for task experience

Prior Team Performance Errors

The number of performance errors made by teams in the action episode (simulation) being debriefed was significantly related to 2 of the 5 DVs (see Table 5). Specifically, those who had made a greater number of errors persisted longer in their debriefings (F(1, 61) = 10.26, p < 0.005) and engaged in less team self-correction than did those who had made fewer errors (F(1, 61) = 15.13, p < 0.005).

Table 5 Univariate effects for prior performance errors

Finally, as depicted in Table 6 and Fig. 2 (note that higher scores for distribution indicate greater variability/less evenness), the number of prior performance errors moderated the impact of debriefing method on the distribution of members’ turns to speak (F(1, 61) = 21.89, p < 0.001). Specifically, there was a simple effect of debriefing condition on speech turn distribution (TDT leading to more evenly distributed turns) for teams which instructors rated as having made 0–1 significant errors in the prior simulation (F(1,61) = 34.81, p < 0.001), but not for those which instructors rated as having made more than 1 significant error (F(1, 61) = 0.04, p = 0.84). Additionally, there was a simple effect of prior performance errors on the distribution of speech turns in chronological debriefings such that teams rated as having made 0–1 errors in the simulation exhibited much less evenly distribution of speech turns than teams rated as having made greater than 1 significant error (F(1, 61) = 19.05, p < 0.001). There was also a simple effect of prior performance errors on speech turn distribution during TDT debriefings, but in the opposite direction (F(1, 61) = 5.14, p < 0.05). Teams that made 0–1 errors in the simulation took more evenly distributed turns to speak than did those rated as having made more than 1 significant error.

Table 6 Univariate results for the interaction of prior performance errors and debriefing method
Fig. 2
figure 2

Interaction of debriefing method and prior performance errors on distribution of speech turns

Hierarchical Linear Modeling (HLM)

Unique Effects of Debriefing Method and Debriefing Facilitator

A series of HLM analyses were conducted to examine whether the random factor of debrief facilitator alone or in combination with debriefing method contributed significant variance in our 5 team-level DVs. Results from these analyses indicated that the random factor of the facilitator by itself was a significant predictor of only one of our team-level DVs: discussion pace (see Table 7, model 1). Consistent with the results from our univariate ANOVA, debriefing method accounted for unique variance in discussion pace even after controlling for the facilitator (see Table 7, model 2).

Table 7 HLM model—team-level discussion pace

Unique Effects of Debriefing Method, Participant, and Ad Hoc Team on Individual-Level Perceptions of Open Communication Climate

Hierarchical Linear Modeling (HLM) analyses were conducted to examine the cross-level effects of team debriefing method on individual-level perceptions of open communication climate both with and without the inclusion of our contextual variables (team size and prior performance as continuous variables), and two-way interactions between method and each control. In addition to the three team-level control variables that were included in our previous analyses, we also included a dichotomous variable that reflected whether the individual was being formally assessed for certification in the simulation that was being debriefed. Finally, we controlled for variance associated with the random effect of the participant, and of the ad hoc team with whom they performed/debriefed.

As shown in Table 8, results indicated that the random effect of the participant was significant whereas the random effect for the ad hoc team was not. Additionally, climate was perceived to be more open in larger teams, and in teams that had made a greater number of performance errors. Participants who were being formally evaluated for certification during the simulation being debriefed perceived their climate to be less open than did those who were not being formally evaluated. Finally, participants perceived the climate in TDT debriefings to be more open than the climate in chronological debriefings (p = 0.025, one-tailed, p = 0.051, two-tailed). Given that this relationship was in the predicted direction, the one-tailed p-value indicates support for Hypothesis 6. We explored the possibility that our control variables may have interacted with debriefing method in a separate analysis (not reported here). None of these interactions were significant. As shown in Table 8 model 2, debriefing method was not a significant predictor of climate without including the set of control variables.

Table 8 HLM model—individual-level climate perceptions

Discussion

Results from the present quasi-experiment demonstrated multi-level effects associated with debriefing method using ad hoc work teams in an operational training and evaluation environment at NASA. We found strong and consistent support for our major thesis that TDT debriefings lead to deeper reflexivity and a more open communication climate than more traditional chronological debriefings. In the sections below, we detail the theoretical and practical implications of these findings, note study limitations, and offer directions for future research.

Theoretical Implications

TDT debriefings and chronological debriefings fundamentally differ with respect to two instructional features: accountability (process or outcome respectively) and teammate interdependence (cooperative or competitive, respectively). As such, our findings complement and extend existing research and theory associated with the MIP-G model (De Dreu et al., 2008). Moreover, results from the present research can be used to further specify the characteristics that distinguish shallow from deep team reflexivity (Konradt et al., 2016; Schippers et al., 2007). Finally, our results contribute to filling a gap noted in the research on team debriefings (Keiser & Arthur, 2021). That is, rather than asking the question “does structure/facilitation matter?” we sought to understand how 2 alternative methods of structuring/facilitating a debrief affect objective indicators of team behavior.

Discussion Length

Prior laboratory research has demonstrated that both process accountability (Hausser et al., 2017) and cooperative interdependence (Super et al., 2016) lengthen the time voluntarily spent on a decision-making task. Consistent with these findings, teams persisted longer in their team reflexivity during TDT debriefings than they did during chronological debriefings. It is important to note that in the present study, teams were given no formal time limits or time requirements for debriefing. As such, we reasoned that variability in debriefing length should be driven (largely) by teams’ beliefs about what level of team reflexivity was “sufficient.” Theory and research supporting the MIP-G model suggest that EM and NfCC are closely tied to such beliefs (De Dreu et al., 2008; Scholten et al., 2007). However, additional research would be needed to explore whether EM and/or NfCC mediate the relationship between debriefing method and length found here. Another direction for future research is to explore the possibility that, in environments where time limits are imposed for debriefings, the length of time allotted may interact with debriefing method. For instance, it is possible that rushing through or abruptly ending a TDT debriefing sooner than team members desire may backfire as participants become frustrated.

Discussion Pace

Discussions during TDT debriefings were significantly slower in pace than were those in chronological debriefings. Moreover, our qualitative review of debriefing audio confirmed that the discussion pace in TDT debriefings was slowed primarily by long pauses between speech turns (vice slower speaking). We expected this to occur because TDT debriefings motivate more effortful cognitive processing (e.g., mental simulation of counterfactual scenarios), and a willingness to express contradictory thoughts, both of which have been associated with longer response delays (Beller et al., 2022; Stivers et al., 2009). The fact that team members raised a greater breadth of unique problems for discussion during less rapidly paced debriefings supports this notion. Moreover, it negates an alternative possibility, namely that slower discussions might have reflected participants’ reluctance to speak (Andersen et al., 2018), since non-answers (e.g., I don’t know, I can’t recall) also tend to be preceded by longer delays than do answers (Stivers et al., 2009).

Very brief pauses between speaker changes during a team discussion are likely to induce time pressure. Time pressure, in turn, is associated with weaker EM and stronger NfCC (Bechtoldt et al., 2010). When time pressure is explicitly induced, teams spend less time brainstorming, generate fewer unique ideas (Chirumbolo et al., 2005), and exhibit less democratic patterns of turn taking, with greater centralization of influence around socially dominant members (Pierro et al., 2003). Similarly, discussion pace was negatively associated with debrief length, the breadth of problem identified, the degree of team self-correction, and the even distribution of speech turns in the present study. We argue that discussion pace is a salient cue signaling time pressure and, as such, may act as a mechanism through which epistemic tuning in teams takes place (Lunn et al., 2007). Future research is needed to explore this notion.

Breadth of Problems Noted

The number of team errors noted by instructors did not differ as a function of debriefing method. However, teams in TDT debriefings raised a significantly greater breadth of problems for discussion than did those in chronological debriefings. Reflection on process patterns is a defining characteristic of deep team reflexivity (Schippers et al., 2007). In this regard, the average number of problems raised for discussion by team members in chronological debriefings was less than 1. Thus, it is impossible to say that they discussed “patterns.” Instead, their selective attention is more consistent with the tradition of accident investigations which notoriously fixate on the most proximal action to a decision-making failure (Reason, 1994). As expected, teams in chronological debriefings appear to have largely spent their time “evaluating finished business” and “closely related issues,” consistent with the definition of shallow reflexivity (Schippers et al., 2014). A qualitative review of transcripts from chronological debriefings revealed that nearly all the problems raised for discussion by team members were closely tied to the outcome of the event being debriefed. This is consistent with our contention that outcome (vice process) accountability was emphasized, and that competitive interdependence (the desire to attribute blame) was made salient in these debriefings.

By contrast, many of the problems raised for discussion by teams during TDT debriefings were “near misses” or were “accidents waiting to happen” realized through counterfactual thinking. These were not, however, simply trivial issues. Instead, they were topics of discussion that significantly delayed the conclusion of the debriefing as was evidenced by the positive correlation between breadth and debrief duration. All teams in TDT debriefings generated at least one example of each teamwork dimension and the number of examples generated by a team was internally consistent across categories. Additionally, teams that discussed a greater number of problems also discussed a greater number of positive examples in their TDT debriefings. These findings bolster our claim that teams took accountability for reflecting on their teamwork processes with respect to the 4 TDT dimensions.

Team Self-Correction

As predicted, teams in TDT debriefings identified a greater proportion of their own problems than did those in chronological debriefings. It is important to note that the number of problems identified by facilitators did not differ between the two methods. Thus, method-related differences in team self-correction were driven by an increase in the number of problems raised for discussion by team members rather than a decrease in facilitator-generated topics. Moreover, a qualitative review of transcripts revealed that facilitators began both types of debriefings by identifying what they believed to be key junctures/problems in the prior performance episode. Thus, our findings are consistent with those from prior laboratory studies in which instructions used to induce NfCC led participants to selectively attend to cues presented to them early in a decision-making task (Bukowski et al., 2013) and to allow a credible expert to make decisions on their behalf (Otto et al., 2016). On the other hand, process accountability has been shown to motivate adherence to display rules (Tunguz & Carnevale, 2011), which, in the present case, was the expectation to engage in “team self-correction.”

Distribution of Speech Turns

As hypothesized, team members showed a more democratic pattern of turn-taking during TDT debriefings than they did during chronological ones. Moreover, teams whose members took more evenly distributed speech turns identified a greater breadth of unique problems for discussion. These findings are consistent with those from previous research in which cooperative interdependence led teams to be more even-handed in weighing the value of members’ uniquely held information (Super et al., 2016). Importantly, our qualitative review confirmed that facilitators addressed each question in the TDT debriefing guide to the team collectively (as was scripted) rather than to specific individuals. Thus, it was not the case that TDT facilitators simply directed questions to individuals more evenly. Instead, team members voluntarily regulated their turn-taking in a more cooperative manner during TDT debriefings.

Open Communication Climate

Participants in TDT debriefings perceived their climate to be more open than did those in chronological debriefings. In this way, the present quasi-experiment responded to calls for more research on the impact of reflexivity interventions on affective (vice cognitive) states (Konradt et al., 2016). Open communication climate has been linked previously to collaborative learning (Martínez-Córcoles et al., 2012) as well as knowledge transfer both within (Hofhuis et al., 2016) and between teams (Mueller, 2014). Moreover, participants who feel that they had the freedom to dissent during their debriefings report greater satisfaction, particularly when the event being debriefed was highly ambiguous (Scott et al., 2013).

Priming openness has been shown to decrease individuals’ discomfort with complex information (Amit & Sagiv, 2013) and to facilitate their confidence in working together with a teammate in the future (Tjosvoldt et al., 2005). Perceptions of open communication climate emerging from TDT debriefings may partially mediate the impact of this strategy on subsequent team performance (Smith-Jentsch et al., 2008) and may have longer-lasting effects on employees’ willingness to voice their concerns. Additional research is needed to explore these possibilities.

Contextual Variables

Prior Performance

Prior research has shown that teams engage in quantitatively less reflexivity following an episode where they performed well than they do following an episode where they performed poorly (Li et al., 2021). Results from the present research extend these findings by demonstrating that teams also exhibit more shallow reflexivity (i.e., shorter and less evenly distributed debriefings) when debriefing a performance episode in which they performed well. This may partially explain why quantitative measures of reflexivity are less positively associated with performance improvement for high performing teams (Schippers et al., 2013). However, we also found that debriefing method moderated this effect. Specifically, the tendency of high performing teams to exhibit less even participation was only observed during chronological debriefings. In TDT debriefings, high- and low-performing teams demonstrated similar tendencies to cooperatively “share the floor.” Theory and research associated with the MIP-G model can be used to explain this interaction. Specifically, when the goal of a team debriefing is to attribute blame for errors that were consequential to a known performance outcome, team members should have little motivation to participate unless they were proximal to one of those errors. The greater the number of consequential errors made, however, the smaller the number of members who can avoid participating altogether. This should attenuate within team disparities in participation. By contrast, when the goal of a debriefing (such as TDT) is to learn from problems that have not yet become consequential as well as those that already have, participation should be more evenly distributed regardless of prior performance.

Team Size

Previous research has largely failed to find a moderating effect of team size on the impact of reflexivity interventions (Lines et al., 2021), including team debriefings specifically (Keiser & Arthur, 2021; Tannenbaum & Cerasoli, 2013). We also failed to find a significant relationship between team size and our objective indicators of team reflexivity depth. This is encouraging given the fact that team size was exceptionally large and variable in the present research (5–26 members). Team size was, however, a unique predictor of individual-level perceptions of open communication climate. Specifically, debriefings involving larger teams were perceived by participants as more open. This is consistent with prior research that has found larger teams to engage in greater cognitive conflict (Amason & Sapienza, 1997) than smaller teams.

Team Task Experience

Teams with higher task experience held longer debriefings, engaged in greater team self-correction, and raised a greater breadth of problems for discussion, regardless of debriefing method. These findings are consistent with the notion put forth by earlier authors (Keiser & Arthur, 2021) that more experienced teams may be better able to critique their own performance. However, more experienced teams also exhibited less evenly distributed speech turns than did less-experienced teams. The explanation for this is unclear. In the context of the present study, more experienced teams were also more functionally diverse (i.e., specialists versus general operators). Team faultlines based on functional diversity have been shown to have negative impacts on team performance in some (but not all) environmental circumstances (e.g., Cooper et al., 2014). Thus, it may be that teams of specialists in the present research exhibited less evenly distributed turns to speak due to their functional diversity rather than their level of task experience. Additional research is needed to explore this possibility.

Random Effects of Facilitator, Team, Participant

The random factor for the debrief facilitator was significant for only one of our five objective indicators of reflexivity depth: discussion pace. This is encouraging since the primary objective of a debriefing guide is to ensure quality control (e.g., Sawyer et al., 2016; Stoto et al., 2019). The effect of the facilitator on discussion pace is consistent with qualitative research that has documented differences in debrief facilitators’ “wait times” after asking a question, with many asking and immediately answering their own questions (Sellberg, 2018) or allowing insufficient wait time after asking a question that requires deep reflection (White et al., 2021). While questions and instructions can be scripted, wait times are at the discretion of the facilitator. Our findings emphasize the importance of incorporating active practice and feedback opportunities for facilitators, particularly with respect to “wait times” after asking questions.

In addition to debriefing method, the random factor for the individual participant explained significant variance in perceptions of open communication climate. The random factor for the team, however, did not. This is likely because we studied ad hoc teams that had not yet developed their own unique norms for openness prior to their debriefing. These findings are consistent with the theory that individuals perceive the world through their own unique lens (James et al., 2008) and support our decision to measure climate at the individual (vice team) level of analysis. Moreover, they suggest that TDT can be used to address the unique challenge of promoting open communication within teams that have little history or future.

Practical Implications

Prior research has demonstrated that TDT debriefings are relatively more successful than chronological ones at promoting cognitive (i.e., mental models) and behavioral (i.e., team performance) training outcomes (Smith-Jentsch et al., 2008). Results from the present study suggest that they are also relatively more successful at motivating teams to communicate openly, and to reflect deeply and cooperatively on their past performance. This is critically important for teams that work in highly variable task environments. However, when a transfer environment is stable, predictable, or unambiguous, chronological debriefings would likely be a better choice than TDT debriefings. This might be, for instance, what is referred to as “mission rehearsal” or “just in time” training. Additionally, time constraints may, in some cases, make TDT debriefings impractical. These debriefings were on average 46 min in length as compared to chronological debriefings which were on average 30 min. Chronological debriefings have been described as a method of choice when time to debrief is limited (Allen et al., 2010). This is often the case when teams debrief (e.g., Allen et al., 2018; Schippers et al., 2014; Stoto et al., 2019). However, our results suggest that the habitual use of chronological debriefings for the sake of expediency may inadvertently create a norm of shallow reflexivity and employee perceptions of climate that discourage employee voice.

TDT debriefings appear to be particularly effective (relative to chronological ones) at motivating distributed participation for high-performing teams (whose members do not otherwise have a strong existential reason to deeply engage). Simulation training for teams can be very costly, particularly for large teams, and debriefings are key to their instructional value (Shinnick et al., 2011). In this regard, TDT debriefings may help to ensure that investments in simulation time are not wasted for highly effective teams.

It is encouraging that the relative benefits of TDT compared to chronological debriefings were robust and unaffected by task experience or team size. Moreover, facilitators were able to implement the TDT method with reliable results after approximately a day and a half of training using a debriefing guide that was generic to team type and to the particulars of a performance event. This is a significant practical concern for many organizations. TDT should be particularly cost-effective for those desiring to ensure quality control across facilitators that vary widely in experience or training, as well as those who experience high turnover of facilitators over time (and thus efficient facilitator training is important). Finally, it is increasingly common for employees to be members of multiple teams concurrently or, as in the present study, to work with different teammates on a day-to-day basis. The use of a common debriefing method/tool should reduce process loss and uncertainty associated with membership shifts. As employees come to expect open communication during TDT debriefings, they may also generalize these norms to debriefings with unfamiliar teammates or across organizational silos making inter-team debriefings more effective.

Study Limitations and Directions for Future Research

The present research employed several design features that lend confidence to the internal validity of our findings. For instance, the length of the simulated performance episodes being debriefed was consistent (i.e., 3 h); TDT debriefings were highly scripted and leaders’ adherence to the method was verified via audio recordings. Additionally, our measures were collected from multiple sources (i.e., audio recordings, instructor observations, transcripts, and team member perceptions); HLM was used to control for variance explained by several random variables (e.g., ad hoc team membership, facilitator, individual team member) and several contextual variables were statistically controlled (i.e., team size, prior performance, task experience). Nonetheless, it is important to acknowledge several study limitations.

Most notably, while each uniquely composed ad hoc team participated in our research only once, individual members overlapped multiple teams. This was an unavoidable constraint of the environment in which data were collected and reflected “business as usual” for the organization. We were able to statistically control the random effects of team composition in our individual-level analyses. However, this was not possible in our team-level analyses. The pattern of effects for debriefing method was consistent (and in the theoretically supported direction) across five objective indicators of reflexivity depth and MANOVA was used to control for family-wise error. However, future research should attempt to replicate these findings in a context in which the independence of team membership can be ensured.

Second, the fluid and often last-minute assignment of individuals to teams for training limited our ability to collect self-report data from all participants. Additionally, time constraints prevented us from collecting direct measures of the motivational constructs theorized to drive our objective indicators (e.g., EM or NfCC). Thus, the present research could be extended in the future by including such measures. With respect to external validity, the organization in which we tested our hypotheses is unique in ways that may limit generalizability to other contexts. Specifically, team members were predominantly male and performed highly technical tasks within an HRO. Finally, team members conducted their debriefings via audio channels only, immediately following a simulation exercise in which primarily audio communication took place between team members. Future research is needed to investigate the potential moderating effects of communication mode, and the time delay between performance episode and debrief.

Conclusion

Team debriefings provide opportunities for organizations to develop norms of deep reflexivity and open communication. Unfortunately, they do not always live up to this potential. We compared two methods of debriefing with respect to these goals. The first, a chronological debrief, is one of the most common methods employed across industries and organizations. The second, Team Dimensional Training (TDT), places a relatively greater emphasis on process versus outcome accountability, and on cooperative versus competitive interdependence. Our findings revealed that TDT debriefings result in deeper team-level reflexivity and more positive individual-level perceptions of open communication climate than do chronological debriefings. In this way, the present research responds to calls for practical tools and strategies that can be used to enhance and to control the quality of team debriefings within organizations and across facilitators and teams.