An Unmet goal? A Scale Development and Model Test of the Role of Inclusive Leadership

Roberts and Mayo (2019) emphasize many of the diversity, equity, and inclusion (DEI) efforts are falling short on creating inclusive workplaces, especially for people of color, and Caver and Livers (2020) echo this. Indeed, 69% of Black HR professionals reported their organization “is not doing enough to provide opportunities for Black employees” (SHRM, 2020). Further, discrimination continues to occur in the workplace (King et al., 2023; Lloyd, 2021), limiting the extent to which all workers can fully contribute and thereby the extent that workplaces can be truly inclusive (Ferdman, 2014; Shore et al., 2018). To help combat this lack of progress, research and practice have pointed to inclusive leadership (IL) (Leading Effectively Staff, 2022; Randel et al., 2018; Shore & Chung, 2022; Westover, 2020). And, while there has been progress in conceptualizing frameworks and models regarding the role of IL in fostering inclusive workplaces (see Ferdman, 2014 and Randel et al., 2018, respectively), there is still a dearth of empirical research examining IL’s impact on fostering workplaces that support members of marginalized groups. Thus, this study seeks to explore how IL may impact experiences of inclusion—defined as when “people of all identities and many styles can be fully themselves while also contributing to the larger collective, as valued and full members” (Ferdman, 2014, p. 235)—for members of marginalized groups while also identifying how IL may promote behaviors that support and uphold inclusion in the workplace, the latter being an especially noticeable gap in the literature.

One major obstacle to examining IL’s impact on supporting members of marginalized groups is the lack of theoretically driven measurement of IL. In 2018, Randel and colleagues put forth a theoretically grounded conceptualization of IL, including a model depicting its impact on workgroup members’ perceptions of inclusion, their psychological outcomes (e.g., empowerment), and ultimately their behaviors (e.g., job performance). In this work, Randel et al. (2018) define and describe IL in ways that are consistent with Shore and colleagues’ (2011) theory-driven depiction of inclusion involving the two components, valuing uniqueness and fostering belongingness, laying the groundwork for IL scale development. However, at the time of this study, an IL measure based theoretically grounded conceptualizations of inclusion was lacking. Specifically, no IL measure targeted both inclusion components with the goal of fostering workplace diversity efforts. Thus, as an initial step to filling this gap, we developed a practically useful 10-item IL measure based on the theoretical propositions of recent inclusion-related literature (i.e., Randel et al., 2018) and following best practices for measure development (e.g., Crawford & Kelder, 2019; Anderson & Gerbing, 1991; Hinkin, 1998).

Upon developing our 10-item IL scale, we sought to examine whether IL does indeed support members of marginalized groups by drawing from a seminal workplace inclusion framework by Ferdman (2014). Taking a broad, fairly comprehensive view of workplace dynamics, Ferdman (2014) put forth a multilevel (i.e., individual to societal), mutually reinforcing inclusion framework prescribing how to foster workplace inclusion. In this framework, Ferdman (2014) proposes that IL is the “linchpin” (p. 19) for inclusion at all other levels due to the implementation and reinforcement of critical practices. Therefore, it provides a framework for how IL may drive perceptions of inclusion of workgroup members (consistent with Randel et al.’s (2018) propositions regarding the impact of IL) and—extending beyond Randel et al.’s (2018) model—drive workgroup members behaviors that help foster inclusion in the workplace. In partial examination of Ferdman’s (2014) proposition, we sought to examine the effects of the leadership level (focusing on IL behaviors) on the two lowest levels of Ferdman’s (2014) framework: (a) interpersonal behaviors that foster inclusion and (b) individual experiences of inclusion. First, we examine whether IL predicts interpersonal behaviors that are important for fostering inclusion for members of marginalized groups: employee allyship and antiracist behaviors. These behaviors are expected to actively demonstrate support for members of marginalized groups and interrupt discriminatory behaviors and, accordingly, are likely to support and uphold inclusion in the workplace (e.g., Cheng et al., 2018; Liu, 2020). Although various works suggest certain behaviors broadly support inclusion (e.g., empathy; Wasserman, 2014), interpersonal behaviors that directly support marginalized groups, such as these behaviors, may be especially important for fostering inclusion of said groups (e.g., Petronelli & Ferguson, 2022). Second, consistent with both Ferdman’s (2014) and Randel et al.’s (2018) model, we examine whether IL predicts employee experienced inclusion. Indeed, Ferdman (2014) argues experienced inclusion is foundational to understanding the presence of inclusion within organizations. In our arguments, we extend these conceptual works by Ferdman (2014) and Randel et al. (2018) by drawing from social information processing (SIP, Salancik & Pfeffer, 1978) and social learning theories (SLT, Bandura, 1977) to explain how these relationships may occur.

Finally, beyond the direct impact of IL, we investigate whether IL indirectly fosters outcomes important for performance and well-being of those in marginalized groups, thereby contributing to workplaces that foster their success. Indeed, Randel and colleagues (2018) propose that IL positively relates to follower perceptions of inclusion and these perceptions, in turn, relate to various psychological outcomes critical to employees. In this vein, we explore whether IL has an indirect effect on employee emotional exhaustion (a key factor of burnout; Maslach et al., 2001, and thus employee well-being) and empowerment (a motivational construct important for effective performance; Spreitzer, 1995), as members of marginalized groups often face additional barriers related to these factors in comparison to other members (e.g., Burns et al., 2021; Caver & Livers, 2020). By examining these relationships, we aim to add to the body of evidence that inclusion—driven by IL—is more than “nice to have” and, instead, has a meaningful, positive impact on workgroup member functioning.

Taken together, this research contributes to the theory and practice of IL in four ways. First, we seek to provide a conceptually up-to-date and practically useful measure of IL to be leveraged in future research and practice. Second, in line with Ferdman’s (2014) conceptual framework and Randel et al.’s (2018) model, we contribute to the consensus on the value of exploring inclusion in the workplace by adding empirical evidence about the role of IL in driving inclusive workplaces. In heeding the critical concerns raised by diversity scholars (e.g., Roberts & Mayo, 2019; Shore & Chung, 2022), we aim to begin filling an important gap by addressing the role of IL specifically in supporting members of marginalized groups. Third, we contribute to theory-building on IL by identifying how and why IL may support psychological factors important for the well-being and performance of employees in marginalized groups. These aims are addressed in two phases: In Phase 1 of this study, we detail the development of a 10-item IL scale. In Phase 2, we test a conceptual model of proposed relationships using a multi-wave design and a sample of heterogeneous participants, varying in racioethnicity and sex. Ultimately, through this study, we aim to provide evidence that inclusion—and IL specifically—does what it is intended to do: benefit employees in marginalized groups.

Inclusive Leadership

Within organizations, leaders are often expected to play a key role in fostering inclusion (Buengeler et al., 2018; Nishii & Mayer, 2009) and IL is especially critical to this end (e.g., Ferdman, 2014, 2021; Nembhard & Edmondson, 2006). Previous research supports the importance of leadership in fostering experienced inclusion (e.g., Gallegos, 2014; Randel et al., 2018). However, much of the empirical work has examined inclusion as a byproduct of other leadership styles. For instance, developing high quality leader-member exchange (LMX) relationships has been linked to perceptions of inclusion (Boekhorst, 2015), whereas other research suggests that servant leadership can increase employee perceptions of inclusion (Gotsis & Grimani, 2016). In contrast, Randel et al. (2018) argue that IL is a distinct form of leadership that warrants additional research. Indeed, Randel et al. (2018, pp. 195) explain key differences between IL and other forms of leadership (i.e., transformational, empowering, servant, authentic, leader-member exchange). For instance, Randel and colleagues note that “servant leadership focuses on developing and creating success for the members but not necessarily on tending to member needs for work group belonging or uniqueness” (p. 195)Footnote 1.

Randel and colleagues (2018) define IL as “a set of positive leader behaviors that facilitate group members perceiving belongingness in the work group while maintaining their uniqueness within the group as they fully contribute to group processes and outcomes” (p. 190). This definition aligns with Shore et al.’s (2011) theoretically driven conceptualization of inclusion (based on Brewer’s (1991) optimal distinctiveness theory) and Ferdman’s (2014) work by targeting two important components of inclusion: (a) fostering a feeling of belongingness and (b) displaying value for uniqueness. Thus, we operationalize IL by developing and validating a scale consistent with Randel and colleagues’ definition due to its theoretical grounding. More specifically, we focus on measuring IL behaviors that foster feelings of belongingness and demonstrating value for uniqueness. Indeed, Randel and colleagues (2018) propose that inclusive leaders foster belongingness by engaging in behaviors related to (a) supporting workgroup members, (b) ensuring justice and equity, and (c) facilitating shared decision-making. Further, Randel and colleagues (2018) suggest inclusive leaders demonstrate value for uniqueness by (a) encouraging diverse contributions and (b) helping group members fully contribute. Importantly, however, while Randel et al. (2018) propose different leader behaviors are necessary to foster feelings of belongingness and being valued for uniqueness, the authors argue that workgroup members must experience both for leaders to be considered inclusive. Indeed, Shore et al. (2011) discuss the impact on workgroup members if one component is missing (e.g., conforming to dominant cultural norms if value for uniqueness is not felt).

Although previous studies have empirically investigated IL in the past (e.g., Carmeli et al., 2010; Choi et al., 2015; Choi et al., 2017; Chung et al., 2020), these studies have not defined and measured IL based on more recent conceptualizations of inclusion (i.e., consisting of valuing uniqueness and fostering belongingness; Shore et al., 2011). Indeed, Randel and colleagues (2018, p. 191) state “none has adequately addressed these fundamental needs of group members to belong and to be valued for uniqueness.” For instance, Carmeli et al.’s (2010) measure of IL (hereafter referred to as IL-C to distinguish between IL scales) is often used in recent inclusion-related research (e.g., Chung et al., 2020); however, the authors developed it as a means to foster psychological safety rather than to support workplace diversityFootnote 2. Indeed, the dimensions of this scale are intended to measure perceptions of a leaders’ openness, accessibility, and availability. Although these dimensions may be useful leadership behaviors, they lack theoretical relationships to inclusion as it is understood today, where inclusion is intended as a mechanism to support diversity in the workplace and is experienced when workgroup member needs of belongingness and uniqueness are fulfilled (Shore et al., 2011). Randel and colleagues’ (2018) definition of IL fills this theoretical gap by identifying leader behaviors that support both needs. Further, though not inconsistent with Nembhard and Edmondson’s (2006) conceptualization of IL (“…words and deeds by a leader or leaders that indicate an invitation and appreciation for others’ contributions,” p. 947), Randel and colleagues’ (2018) definition goes beyond this by including belongingness and valuing individuals’ personal identities irrespective of workplace contributions.

Consistent with Randel and colleagues’ work, we sought to focus the scale on leader behaviors rather than perceptions of a leader’s inclusiveness. Indeed, Randel et al.’s (2018) work provides sets of behaviors in which leaders engage that are predicted to result in follower experiences of inclusion. By measuring these behaviors, we focus on whether leaders are taking an active rather than passive (e.g., being “accessible,” Carmeli et al., 2010) approach to fostering inclusion. Additionally, Fischer and Sitkin (2023) argue the need for leadership measures to maintain a distinction between what the leader does and the effects on the follower to prevent causal indeterminacy. Indeed, previous measures of IL that focus on follower perceptions of the leader may conflate leader behavior with the behavioral effects (e.g., perceptions that a leader is “open,” Carmeli et al., 2010). Thus, a measure of IL that is behaviorally based may be more appropriate for examining the effects on follower-related outcomes, such as experiences of inclusion.

Of note, since this study was conducted, a new and conceptually up-to-date IL scale has been developed (see Al-Atwi, & Al-Hassani, 2021). However, we do find that the scale’s length (25 items) limits its practical usefulness. Further, Al-Atwi and Al-Hassani’s (2021) scale has yet to be validated in a racially diverse population. Thus, we contend that our scale—which was likely developed concomitantly—is still practical for the purpose of this study and validity evidence may generalize more appropriately to racially diverse workplaces. Ultimately, based on the discussion above, we sought to develop a behaviorally based and up-to-date IL scale to be especially of value for U.S. based organizations struggling with DEI efforts by keeping it fairly brief, thus practical, and validating it based on evidence from U.S. employees varying in race and sex.

Phase 1: Scale Development

Within Phase 1, we first present the initial IL scale item development and reduction. Next, we present the psychometric properties of the scale in five distinct samples. Finally, we present initial convergent and discriminant validity of our IL measure.

Item Development and Reduction

Items were generated using a deductive approach (Hinkin, 1998), based on the subdimensions of IL developed by Randel et al. (2018) and the examination of additional relevant resources on inclusion (e.g., Ferdman & Deane, 2014; Shore et al., 2011; Shore et al., 2018). Randel et al. (2018) outlined five groups of behaviors of IL: supporting individuals as group members, ensuring justice and equity, sharing decision-making, encouraging diverse contributions, and helping group members contribute fully. Recommended item writing guidelines were followed (Hinkin, 1998). In accordance with Crawford and Kelder (2019), members of the research team not involved in prior development steps reviewed the items for coverage of the content domain (contamination and deficiency) and for grammatical and spelling errors. Items were edited as appropriate, resulting in a set of 51 items construing 8–14 items per group of behavior of IL. Items were then assessed for content adequacy via an item-sort task (Anderson & Gerbing, 1991).

Participants

Twenty-one participants were recruited to complete the item-sort task, which is considered an adequate sample size (Anderson & Gerbing, 1991) and above the minimum of nine participants recommended by Crawford and Kelder (2019). Participants were 43% male, 90% white, with a mean age of 38 (SD = 12.64). According to Crawford and Kelder (2019), participants should have expertise in research or academic insight. Thus, participants were in related graduate programs, academia, or consultants with research backgrounds. These participants were given a list of the 51 items in random order and the definitions for each group of IL behaviors. Participants were asked to assign items to the group they best represented.

Item Reduction Results

The data were analyzed by calculating the proportion of substantive agreement (PSA) and coefficient of substantive validity (CSV) indices for each item (Anderson & Gerbing, 1991). Items with PSA values greater than 0.75 and CSV values greater than 0.60 were retained. These values were retained based on Colquitt et al.’s (2019) guidelines where orbiting constructs are strongly correlated with each other, suggesting these cutoffs would represent “strong” evidence for content validation. This resulted in a set of 27 items, with five to eight items in each set of behaviors (PSA mean = 0.88, SD = 0.08; CSV mean = 0.78, SD = 0.13). Next, to reduce the length of the scale for practical use, the researchers independently selected items that best reflected behaviors with minimal overlap among other items. The selections were reviewed for agreement and discussed. This process resulted in a 10-item scale, presented in Table 1.

Table 1 Inclusive leadership scale with dimensions

Psychometric Properties IL Scale

Next, psychometric properties of the IL scale were examined using CFAFootnote 3 on the basis of data from five distinct samples.

Participants

Five samples were recruited from various sources: 190 participants through MTurk in 2018, 211 participants through MTurk in 2022, 106 adult learners in an Organizational Leadership Master’s program (data collected from 2019 to 2021), 161 employees from a Midwest university (data collected in 2021), and 205 employees from a Midwest manufacturing company (data collected in 2022; demographics in Table 2)Footnote 4. The samples were subject to a number of detections of insufficient effort responding. For instance, MTurk data collected efforts used a priori methods, such as instructed response (e.g., “Please select strongly agree”) and bogus items (e.g., “I eat concrete”) (Ward & Meade, 2023). In the remaining data collection efforts where these a priori methods were not deemed appropriate for the context (e.g., personal development, client engagement), response times were examined for excessively fast responses (e.g., < 2 s per item, Bowling et al., 2016) and open-ended responses were screened to ensure sensible responding. Sample sizes were considered appropriate as there were more than three-indicators per factor (10 items to 1 IL factor) and sample sizes above 200 can prevent convergence failures and improper solutions (Anderson & Gerbing, 1984; Kelloway, 2015).

Table 2 Demographics of samples used to evaluate psychometric properties

Confirmatory Factor Analysis Results

Prior to conducting the CFA, items were assessed for normality by examining skewness and kurtosis. Both coefficients for all items were below the absolute value of 2.0 and, therefore, were deemed acceptable (Ferguson & Cox, 1993). CFAs with maximum likelihood (ML) estimation were conducted using the lavaan package (v. 0.6–14; Rosseel, 2012) in RStudio. CFAs from all samples demonstrated good fit with standardized factor loadings above 0.72Footnote 5 (Hu & Bentler, 1999; see Table 3). Additionally, we tested an alternative CFA model with two-dimensions: belongingness and uniqueness based on Randel et al.’s (2018) model. The fit indices were similar to the unidimensional model (χ2 (34) = 65.87, p = .001, CFI = 0.98, TLI = 0.97, SRMR = 0.03, RMSEA = 0.07 [0.04, 0.10]); however, the interfactor correlations were extremely high (r = .97). We then tested a bifactor model as well while setting the dimensions as orthogonal (Chen et al., 2012), and the model fit results were as follows: χ2 (25) = 34.70, p = .094, CFI = 0.99, TLI = 0.99, SRMR = 0.02, RMSEA = 0.05 [0.00, 0.08]. While the model fit results were stronger in the bifactor model, the factor loadings for each dimension were mostly below the 0.40 factor loading threshold (ranging from |0.08| to |0.36| for belongingness and 0.11 to 0.62 for uniqueness), and they were also stronger on the overall bifactor IL dimension (ranging from 0.78 to 0.92). Based on these results, we continue to test hypotheses with the unidimensional model. However, we also still report the items for the two-dimensional model (belongingness and uniqueness; see Table 1) to represent the theoretical conceptualization (Randel et al., 2018) and allow for future research to further examine dimensionality of the measure.

Table 3 Summary of confirmatory factor analysis statistics for IL scale

Further, due to our interest in particular demographic groups, we tested measurement invariance across those demographic groups. Specifically, we conducted two tests of measurement invariance: gender and raceFootnote 6. First, we tested configural invariance (i.e., factor structure), and results supported configural invariance for both (a) gender (χ2 (70) = 117.86, p < .001, CFI = 0.97, TLI = 0.96, SRMR = 0.03, RMSEA = 0.08 [0.06, 0.11]) and (b) race (χ2 (70) = 133.76, p < .001, CFI = 0.96, TLI = 0.95, SRMR = 0.04, RMSEA = 0.09 [0.07, 0.12]). Second, we tested metric invariance (i.e., equal factor loadings), and results supported metric invariance for both (a) gender (χ2 (79) = 121.59, p = .001, CFI = 0.97, TLI = 0.97, SRMR = 0.04, RMSEA = 0.07 [0.05, 0.10]) and (b) race (χ2 (79) = 145.48, p < .001, CFI = 0.96, TLI = 0.95, SRMR = 0.06, RMSEA = 0.09 [0.07, 0.11]). Finally, we tested scalar invariance (i.e., equal item intercepts), and results supported scalar invariance for both (a) gender (χ2 (88) = 131.68, p < .001, CFI = 0.97, TLI = 0.97, SRMR = 0.05, RMSEA = 0.07 [0.04, 0.09]) and (b) race (χ2 (88) = 162.35, p < .001, CFI = 0.95, TLI = 0.95, SRMR = 0.07, RMSEA = 0.09 [0.07, 0.11]).

Convergent and Discriminant Validity Evidence for IL Scale

To establish convergent validity evidence, the relationships of IL with measures of several theoretically related constructs were explored. Specifically, we expected that our IL measure would be positively related to another measure of IL, the IL-C (Carmeli et al., 2010). However, their measure focuses on manager openness (which may relate to the IL dimension of encouraging diverse contributions), availability, and accessibility (both of which may relate to IL components of supporting group members). Additionally, IL was expected to positively relate to perceived supervisor support (PSS). PSS is indicative of whether employees perceive that their supervisors both care about them and value their contributions (Kottke & Sharafinski, 1988), which align with the supporting group members and encouraging diverse contributions dimensions of IL, respectively. Finally, IL was expected to negatively relate to destructive leadership, involving behaviors related to unethical actions, poor decision-making, bullying, and over-controlling (Shaw et al., 2012). These are in opposition with the underlying theme of IL, specifically related to belongingness. To establish discriminant validity evidence, the relationships of IL and measures of theoretically unrelated constructs were examined. Whether leaders are inclusive is expected to be unrelated to individuals’ tenure within their current company or their age; therefore, no relationship was expected for those variables.

Measures

For convergent validity, we used Carmeli et al.’s (2010) measure of IL (IL-C), consisting of nine items on a five-point scale (1 = not at all; 5 = to a large extent). PSS was measured using a modified (replacing organization with supervisor) perceived organizational support scale (Eisenberger et al., 1986). Items were measured on a seven-point scale (1 = strongly disagree; 7 = strongly agree). Consistent with previous research (Shanock & Eisenberger, 2006), four dimensions of Shaw et al.’s (2012) destructive leadership scale were measured: (a) making decisions based on inadequate information (five items), (b) acting in a brutal bullying manner (seven items), (c) lying and other unethical behavior (seven items), and (d) micro-managing and over-controlling (seven items). Items were measured on a five-point scale (1 = strongly disagree; 5 = strongly agree).

For discriminant validity, we measured tenure. To measure tenure in the both the MTurk and student samples, participants were asked to indicate how long they have worked for their current employer on a six-point scale (1 = less than one year; 6 = 21 + years) with a not applicable option available for those who were unemployed in the MTurk sample only. The individuals who chose not applicable were removed from the analyses (n = 16; see Table 4 for a list of measures by sample).

Table 4 Overview of studies and measures used

Convergent and Discriminant Validity Results

For initial evidence of convergent and discriminant validity, results from zero-order correlations using data from the MTurk and student samples supported the expected relationships (see Tables 5 and 6, respectively, along with scale descriptives and reliabilities)Footnote 7.

Table 5 Descriptives and zero-order correlations from MTurk 2018 sample
Table 6 Descriptives and zero-order correlations from Organizational Leadership Program sample

To determine whether IL was separable from the constructs with which it significantly correlated, we conducted additional model comparisons using CFA. Specifically, for three constructs (utilizing the MTurk 2018 sample [N = 190]), a two-factor model fit significantly better than the models where the variance between the two latent factors was set to 1.0: Carmeli et al.’s (2010) IL-C scale, (Δχ2 [1] = 215.93, p < .001); destructive leadership, (Δχ2 [1] = 421.9, p < .001); PSS (Δχ2 [1] = 335.18, p < .001). These results suggest the IL measure is a separate construct from Carmeli et al.’s (2010) IL-C scale, destructive leadership, and PSS. Further, following recommendations of examining ΔCFI > 0.002 (Meade et al., 2008), the ΔCFI for each of the models were as follows: including Carmeli et al.’s (2010) IL-C scale (−0.067), destructive leadership (−0.184), and PSS (−0.079), all suggested discriminant validity. Finally, the interfactor correlation between our IL scale and Carmeli et al.’s (2010) scale (0.86 [CI: 0.82, 0.91]), destructive leadership (−0.51 [CI: − 0.62, − 0.39]), and PSS (0.82 [CI: 0.76, 0.87]). Taken together, results suggest that there may be marginal problems with discriminant validity, depending on the technique used, but overall, the constructs do have some evidence for distinctiveness (Rönkkö & Cho, 2022).

Phase 2: Model Test of Inclusive Leadership’s Impact

As demonstrated in Phase 1, we established the psychometric properties of the IL scale, as well as the convergent and discriminant validity of IL. In this Phase 2, we turn to arguments for our formal hypotheses and test of a conceptual model (see Fig. 1). The primary goal of this model test is to demonstrate evidence of IL’s usefulness in fostering inclusion and supporting members of marginalized groups.

Fig. 1
figure 1

Hypothesized model and results. Note. IL, inclusive leadership; WGI, workgroup inclusion; indirect effects are after the “/”; * p  < .01

Inclusive Leadership’s Role in Supporting Members of Marginalized Groups

In accordance with Ferdman’s (2014) multilevel inclusion framework, IL is expected to play a pivotal role in fostering inclusion at other levels within organizations. In this framework, Ferdman holds that inclusion can be conceptualized and fostered at the societal level (the highest level), followed by organizational-, leadership-, group-, interpersonal-, and individual-level. Ferdman (2014) argues that inclusive leaders reinforce inclusive behaviors in others and directly foster experienced inclusion through their own inclusive interactions with others. Thus, in this study, we seek to explore the role IL has on fostering employees’ (a) inclusive interpersonal behaviors and (b) individuals’ experiences of inclusion.

To further conceptualize the connections between Ferdman’s (2014) framework and inclusive interpersonal behaviors and experiences of inclusion, we draw from SIP (Salancik & Pfeffer, 1978) and SLT (Bandura, 1977). According to SIP, social environments provide cues that employees use to make sense of their workplaces (Salancik & Pfeffer, 1978). More specifically, SIP suggests that employees’ perceptions and attitudes are shaped through the processing of cues in their social environments, which, in turn, shape their behaviors. An important source of these cues is expected to be leaders (Schneider et al., 2013). Indeed, Thomas and Griffin (1989) suggest leaders are likely sources of these cues directly, such as by making statements about the workplace, and indirectly, such as through interacting in ways that illustrate what is supported in the workplace. In a similar vein and in accordance with SLT (Bandura, 1977), employees, viewing their leaders as role models, may vicariously learn what behaviors are supported in the workplace through observation of the leaders’ interactions with others. Indeed, according to SLT, individuals can learn behaviors from observing and emulating behaviors of others. Although whom others observe and emulate may vary, leaders are expected to be particularly attractive models for employees to emulate because of their status and power (Brown & Trevino, 2014).

Inclusive Leadership and Interpersonal Behaviors

Inclusive interpersonal behaviors refer to the way employees interact with others around them to create experiences of inclusion (Ferdman, 2014). In the current study, we focus on interpersonal behaviors that target the inclusion of members of marginalized groups. To make a difference in establishing inclusive workplaces for those in marginalized groups, employees may need to be actively involved in supporting those groups, such as by standing up against injustices targeting those groups and their individual members. Although previous works have examined the impact of IL on fostering employee behaviors, these behaviors are often not connected to facilitating the inclusion of other employees. For instance, previous works have examined the role of IL in fostering employee voice (Guo et al., 2020; Jiang et al., 2020). Although this is an active behavior that could be beneficial for supporting inclusion, voice does not necessarily indicate the support of members of marginalized groups or the promotion of inclusion for those members. Thus, taking an active role in specifically supporting marginalized groups (allyship behaviors) and interrupting forms of discrimination and prejudice when encountered (antiracist behaviors, e.g., Pollock, 2008) may be particularly meaningful for supporting and upholding inclusion. Therefore, we explore the role of IL in fostering interpersonal behaviors in the form of allyship for marginalized groups and antiracism.

Allyship is an ongoing commitment to leverage privilege in renouncing the discriminatory assumptions and practices and focus on the priorities of marginalized counterparts (Erskine & Bilimoria, 2019). In accordance with Ostrove and Brown (2018), allyship involves actively supporting causes relevant to marginalized groups to which individuals are allied. For instance, Ostrove and Brown (2018) have identified allyship behaviors specific to supporting people of color, such as affirming attitudes (e.g., communicating caring and respect for people of color) and taking informed action (e.g., taking part in activism). Allyship is expected to help promote equity and cultural change in support of marginalized groups (Cheng et al., 2018).

Antiracism is the “action taken by a person or persons (not directly involved as a target or perpetrator) to speak out about or to seek to engage others in responding (either directly or indirectly, immediately or at a later time) against interpersonal or systemic racism” (Nelson & Dunn, 2011, p. 265). Thus, antiracism in this context is more reactive to acts and systems of racism compared to allyship. Although antiracism has a narrower focus than allyship, we contend that a focus on race is meaningful for targeting all marginalized groups as (a) racism continues to be a substantive problem within organizations and especially U.S.-based organizations (Roberts et al., 2019) and (b) the “curb-cut effect” (Blackwell, 2016), which suggests that countering disadvantages for one group (in this case, employees with marginalized racioethnic identities) can benefit others.

Ultimately, employees engaging in these allyship and antiracist behaviors should foster inclusive workplaces by supporting equity and interrupting injustices in the workplace, respectively. As allyship and antiracism behaviors are consistent with aspects of inclusion (e.g., ensuring justice and equity), to the extent that leaders behave in inclusive ways, employees may be likely to interpret that the workplace values allyship and antiracism behaviors (consistent with SIP) and/or learn behaviors consistent with allyship and antiracism (consistent with SLT) and, ultimately, may be more likely to engage in these behaviors themselves.

  • Hypothesis 1: IL is significantly and positively related to employee allyship.

  • Hypothesis 2: IL is significantly and positively related to employee antiracism.

IL and Experienced Inclusion

In addition to promoting inclusive interpersonal behaviors, a key question is whether IL impacts employees’ experienced inclusion. Indeed, Ferdman (2014) suggests experienced inclusion “plays a key role in assessing inclusion’s existence or potency” (p. 16). Consistent with previous literature (Chung et al., 2020; Ferdman, 2014; Shore et al., 2011), we expect experienced inclusion to manifest in two important ways: through perceived workgroup inclusion (WGI) and authenticity. We again draw from SIP and SLT in making arguments for the impact of IL on employee inclusion.

WGI is conceptualized as whether employees feel they are treated as if they belong to the group and are valued for their uniqueness by members of their workgroup (Chung et al., 2020). Nishii and Leroy (2021) refer to the workgroup as “the most critical context for determining experiences of inclusion” (p. 162) as employees likely experience most of their interactions within this context. Further, seminal work on inclusion by Shore and colleagues (2011) discusses the importance of the workgroup for shaping the experience of inclusion, as one’s immediate environment is expected to be especially impactful for their experiences. Similar to the arguments above regarding SIP and SLT, IL behaviors may lead to greater workgroup inclusivity as group members interpret leader behaviors signaling group norms and model their leaders’ inclusive behaviors. Indeed, Randel and colleagues (2018) suggest perceptions of inclusion are facilitated by inclusive leaders “serving as a role model and reinforcing such behaviors among group members” (p. 192). Thus, upon experiencing IL, group members may be more likely to behave inclusively (fostering belongingness among group members, demonstrating value for group members’ uniqueness) when interacting with other members of the group, fostering perceptions of WGI. Indeed, previous research has provided initial support for the relationship between IL and WGI (Chung et al., 2020)Footnote 8.

  • Hypothesis 3: IL is significantly and positively related to WGI.

Authenticity may be defined as “the degree to which a person acts in agreement with one’s true self” (van den Bosch & Taris, 2013, pp. 1 & 2). Whether employees feel as if they can be fully authentic at work is an important indicator of experienced inclusion. Centered in experienced inclusion is the idea that people should not feel as if they have to withhold core aspects of their identity to belong (Ferdman, 2014; Shore et al., 2011). According to Rogers (1965), to function at their fullest potential, one must feel authentic. However, evidence shows employees of marginalized groups often feel inauthentic in the workplace or penalized for being authentic as they face racist challenges, like microaggressions and professional norms based on whiteness (Roberts & Mayo, 2019). Thus, authenticity may indeed be a critical employee outcome indicative of truly inclusive workplaces, such that authenticity represents an aspect of perceived inclusion (Jansen et al., 2014). Further, IL behaviors should encourage followers to share their own perspectives and experience psychological safety, providing an opportunity for them to practice their authentic self (Shore & Chung, 2022). As van den Bosch and Taris (2013) discuss, authenticity can be influenced by social context, including leader behaviors. Again, consistent with SIP, IL behaviors (e.g., demonstrating value for uniqueness) may signal to employees that their unique identities and expressions are accepted and supported, contributing to greater authenticity.

  • Hypothesis 4: IL is significantly and positively related to employee authenticity.

The Indirect Effect of IL on Emotional Exhaustion and Empowerment

Inclusion is often examined as a means for creating workplace environments that foster both well-being and performance and inclusive leaders are argued to play an important role to this end (Veli Korkmaz et al., 2022). Randel and colleagues (2018) suggest IL impacts psychological factors important for workplace outcomes through its effects on individuals’ experienced inclusion. In examining psychological factors as outcomes of experienced inclusion, we chose to focus on emotional exhaustion and, consistent with Randel and colleagues’ (2018) propositions, employee empowerment. These psychological factors are important for well-being and performance (respectively) and are also especially relevant to members of marginalized groups. For instance, recent reports show that employee burnout is high among members of marginalized groups (Burns et al., 2021). SHRM (2021) also reports that Black and Hispanic workers are more likely to feel exhausted upon leaving work in comparison to white workers. Indeed, consistent with the facades of conformity theory, stigmatized individuals try to minimize their unique attributes to assimilate with the group (Hewlin, 2003), which leads to emotional exhaustion (Hewlin, 2009). Further, evidence suggests that organizations often fail to provide equitable skill and relationship-building opportunities for members of historically marginalized groups (Caver & Livers, 2020), potentially hindering their socio-political support and thus feelings of empowerment. Therefore, we seek to examine the role of IL in influencing these outcomes through IL’s impact on measures of experienced inclusion: WGI and authenticity.

Emotional exhaustion, or “feelings of being emotionally overextended or drained” (Leiter & Maslach, 1988, p. 297), is a core aspect of burnout (Maslach et al., 2001). According to the Job Demands-Resources model (JD-R; Bakker & Demerouti, 2007), job demands cause strain, such as burnout, whereas job resources can buffer that strain. However, JD-R has since been extended by works that illustrate the direct effects of resources, such as social support, on strain, and, specifically, emotional exhaustion (Aronsson et al., 2017; Crawford et al., 2010). For instance, drawing from conservation of resources theory (Hobfoll, 1989), Crawford and colleagues (2010) discuss how resource depletion (real or the threat of loss) can cause strain that builds up over time, but additional resources may protect workers from this build up. Thus, WGI may provide social support, an important resource in JD-R (Bakker & Demerouti, 2007), that may act to help prevent the accumulation of strain. Indeed, previous research has provided initial support for this effect by illustrating the relationship between perceptions of military unit inclusion and emotional exhaustion (Merlini et al., 2019). Further, inhibiting one’s authentic self may be a psychological demand that, in accordance with JD-R, should lead to greater strain. Research shows suppression (e.g., hiding one’s felt emotions) leads to increased emotional exhaustion (Hülsheger & Schewe, 2011; Grandey, 2003). IL’s enhancement of employees’ authenticity may mitigate harmful suppression-related workplace demands and thus emotional exhaustion.

  • Hypothesis 5: IL has a negative indirect effect on employee emotional exhaustion through (a) WGI and (b) authenticity.

Empowerment involves one’s feelings that they have the desire and capability to have an impact at work (Spreitzer, 1995) and has been found to predict employee performance (Seibert et al., 2011). As discussed by Randel et al. (2018), when employees experience inclusion, they are provided with opportunities to provide their unique insights and make contributions to work-related decisions. Further, individuals who are provided these opportunities feel more of a sense of “both impact and control” (p. 198). Thus, the authors argue that this, in turn, should foster employee empowerment. Beyond opportunities to make direct contributions to work, feelings of empowerment are expected to be shaped, in part, by contextual factors such as socio-political support and internal factors, such as one’s belief in their capabilities to successfully perform work and make an impact (Seibert et al., 2011). WGI, involving feeling valued and cared about by workgroup members, may lead to greater information sharing and social connectedness within one’s workgroup, thus increasing socio-political support and subsequently empowerment. Similarly, authenticity may help in relationship building with colleagues (Song et al., 2020), thus fostering social support as well. Further, authenticity has been found to predict psychological capital (Song et al., 2020), which includes characteristics that may contribute to one’s perception of having an impact at work (e.g., efficacy, optimism; Luthans et al., 2015) and thus empowerment. IL’s enhancement of WGI and employees’ expression of their authentic selves may foster both contextual and psychological resources, leading to greater empowerment.

  • Hypothesis 6: IL has a positive indirect effect on employee empowerment through (a) WGI and (b) authenticity.

At this time, we reiterate the issues that prompted this study: employees in marginalized groups continue to face unjust barriers in the workplace (Caver & Livers, 2020; Roberts & Mayo, 2019) and IL is intended to help mitigate those barriers. Thus, a key question in this research is whether IL is indeed doing what it is supposed to do: supporting those in marginalized groups, of whom inclusion arguably matters the most. Indeed, working in the DEI space necessitates examining organizational data by group membership to determine whether DEI efforts are working as intended, ensuring the experiences of those in marginalized groups are not lost in the aggregate (Merlini & Williams, 2023; Roberts & Thomas-Hunt, 2022). However, in much previous research (e.g., Carmeli et al., 2010; Choi et al., 2017), the impact of IL is examined at the aggregate level across employees of various races and sex, leaving us unaware of whether IL has equivalent or differential results for certain groups. Thus, we find it important to empirically explore the hypothesized relationships across various identity groups to ultimately better reveal whether IL is serving its core purpose.

  • Research Question 1: Are there differences in the hypothesized relationships on the basis of group membership (i.e., sex, racioethnicity)?

Method

Procedure and Participants

Data were obtained from participants recruited from CloudResearch (Litman et al., 2017) in Spring of 2021. Participants were employed at least 10 h per week, and parameters were set such that at least half the sample did not identify as “white.” This ensured greater representation of members of marginalized groups as inclusion efforts typically fall short for these members (Roberts & Mayo, 2019). Further, we included participants that identified as white as antiracism and allyship behaviors engaged in by these members may be particularly impactful due to inherent privileges. This sampling also allowed investigation of group differences in relationships.

Participants opted into a three-wave study, taking surveys once a week for 3 weeks. This design was chosen to temporally separate the independent, mediator, and dependent variables to be consistent with the expected direction of the relationships (e.g., Mitchell & James, 2001) and to help reduce common method bias (Podsakoff et al., 2003), such that the independent variable was measured at time 1, the mediators at time 2, and the dependent variables at time 3. Although causality cannot be assumed due to factors such as the lack of random assignment, control, and potential confounds in this study, Hayes (2018) argues such a relationship can still be modeled and empirically tested. Participants were compensated $0.50 for the first two waves and $0.60 for the last wave.

There were 219 participants in the study; however, eight were removed upon failing multiple attention checks. This resulted in 211 participants in the first wave, 173 of those participants in the second, and 156 of the second wave participants in the third wave (full demographics in Table 7), which yielded adequate power for our analyses (Fritz & MacKinnon, 2007)Footnote 9.

Table 7 Demographics of focal study sample

Wave 1 of the survey contained participant demographic questions along with measures of IL and diversity beliefs. Wave 2 contained measures of WGI and authenticity. Finally, Wave 3 contained measures of allyship, antiracist behavior, emotional exhaustion, and empowerment.

Measures

IL was assessed using the measure developed during Phase 1 of the current manuscript. Specifically, we utilized the 10-item measure representing overall IL using a five-point scale (1 = never; 5 = always).

WGI was assessed using Chung et al.’s (2020) 10-item measure using a five-point scale (1 = strongly disagree; 5 = strongly agree). An example item is “I feel that people really care about me in my work group.”

Authenticity was assessed using van den Bosch and Taris’s (2013) 12-item measure using a seven-point scale (1 = does not describe me at all; 7 = describes me very well). An example item is “I am true to myself at work in most situations.”

Allyship was assessed using five items of the behavioral dimension of Jones et al.’s (2014) ally identity measure (with modifications for the workplace and broadened from the LGBT focus), using a five-point scale (1 = strongly disagree; 5 = strongly agree). An example item is “I have engaged in efforts to promote more widespread acceptance of historically oppressed people.”

Antiracist behavior was assessed using nine items of Pieterse et al.’s (2016) individual advocacy dimension (with modifications for the workplace) of the Anti-Racism Behavioral Inventory (ARBI), using a five-point scale (1 = strongly disagree; 5 = strongly agree). An example item is “I interrupt racist conversations and jokes when I hear coworkers talking that way.”

Emotional exhaustion was assessed using Maslach and Jackson’s (1981) four-item measure, using a five-point scale (1 = once a week; 5 = several times a day). An example item is “I feel burned out from my work.”

Empowerment was assessed using Spreitzer’s (1995) 12-item measure, using a five-point scale (1 = strongly disagree; 5 = strongly agree). An example item is “I am confident about my ability to do my job.”

Diversity beliefs were measured using Homan et al.’s (2010) 4-item measure as a control, using a seven-point scale (1 = completely disagree; 7 = completely agree). An example item is “I believe that diversity is good.”

Results

Descriptives, reliabilities, and correlations of composite study variables are in Table 8. Path analysis, using the lavaan package (v. 0.6–16; Rosseel, 2012) in RStudio, was used to estimate the model fit and test the hypothesesFootnote 10 (see Fig. 1). The fit indices indicated good fit (χ2 [4] = 3.15, p = .53, CFI = 1.00, TLI = 1.00, SRMR = 0.02, RMSEA = 0.00 [90% CI: 0.00, 0.09]; Hu & Bentler, 1999; see Table 3). Support was found for Hypotheses 1 and 2 such that IL was positively related to allyship (b = 0.28, p = .001) and antiracism (b = 0.32, p < .001). IL also positively related to WGI (b = 0.57, p < .001) and authenticity (b = 0.25, p < .001), supporting hypotheses 3 and 4. IL was indirectly related to emotional exhaustion through both WGI (b = −0.30, p < .001, 95% CI = −0.50 to −0.11) and authenticity (b = −0.17, p < .001, 95% CI = −0.26 to −0.08), supporting Hypothesis 5 (a and b). Finally, IL was indirectly related to empowerment through WGI (b = 0.31, p < .001, 95% CI = 0.20 to 0.41), but not through authenticity (b = 0.01, p = .49, 95% CI = −0.02 to 0.05), supporting Hypothesis 6a but failing to support Hypothesis 6b.

Table 8 Descriptives and zero-order correlations from focal sample

Moderation Analyses

In addressing Research Question 1, a series of path analyses were conducted to determine whether there are meaningful differences in relationships on the basis of group membership. The following comparisons were conducted: (a) whites vs. members of marginalized racioethnic groups, (b) males vs. non-males, (c) white males vs. male members of marginalized racioethnic groups, (d) white non-males vs. non-male members of marginalized racioethnic groups. Thus, interaction effects for the various direct effects of IL on predicted outcomes (authenticity, allyship, ARBI, WGI). Additionally, conditional indirect effects from IL to predicted outcomes (empowerment and emotional exhaustion) through WGI and authenticity were examined to see if racioethnic background, sex, and the interaction of racioethnic background and sex impacted the relationships. Unless otherwise specified, coefficients that follow are significant/confidence intervals do not contain zero. Below, we provide the results for four separate path analyses where the moderator variable is the only difference between each: (1) whites vs. members of marginalized racioethnic groups, (2) males vs. non-males, (3) white males vs. male members of marginalized racioethnic groups, (4) white non-males vs. non-male members of marginalized racioethnic groups (see Table 9). Specifically, we test a moderated mediation model including a test of the conditional indirect effects for each mediated effect. Further, we conducted 95% bias-corrected confidence intervals to test these effects.

Table 9 Moderation path analysis results

Across each of the models, there were only three significant interaction effects. Specifically, for males vs. non-males, the relationship between IL and authenticity was moderated by sex (b = 0.30, p = .03). Simple slope analyses revealed that the relationship between IL and authenticity for males was b = 0.10 [−0.04, 24] while the relationship was b = 0.38 [0.25, 0.51] for non-males, thus stronger for non-males (see Fig. 2). Additionally, the relationship between IL and ARBI was moderated by sex (b = − 0.38, p = .04). Simple slope analyses revealed that the relationship between IL and ARBI for males was b = 0.52 [0.28, 0.77], while the relationship was b = 0.15 [−0.09, 0.38] for non-males, thus stronger for males (see Fig. 3). Finally, there was a conditional indirect effect from IL to emotional exhaustion through authenticity was moderated by sex (indirect effect = −0.20 with a 95% bias-corrected bootstrapped CI [−0.50, −0.04] with 1000 samples. Specifically, for males, the indirect effect was −0.20 [−0.50, −0.03], while the indirect effect was for non-males −0.41 [−0.99, −0.07]. Thus, the indirect mediation effect was stronger for non-males compared to males. Remaining moderation effects and conditional indirect effects were non-significant.

Fig. 2
figure 2

Interaction effect between gender and IL on authenticity. Note. IL, inclusive leadership

Fig. 3
figure 3

Interaction effect between gender and IL on ARBI. Note. IL, inclusive leadership; ARBI, antiracist behavior inventory

Discussion

Both practitioners and scholars point to IL as a vital component to making inclusion work in the workplace, yet empirical evidence to this end was lacking. This study sought to fill this gap by first developing a brief and conceptually up-to-date IL scale, then providing evidence of IL’s influence on (a) interpersonal behaviors that uphold inclusion (allyship, antiracism) and (b) experienced inclusion (WGI, authenticity). Further, we show evidence that IL, through experienced inclusion, has benefits for employee well-being and performance (emotional exhaustion, empowerment), demonstrating it is more than “nice to have.”

In Phase 1, results demonstrate compelling validity evidence for an IL scale that may be especially useful for practice in U.S. organizations. Specifically, item writing and reduction best practices (e.g., Anderson & Gerbing, 1991; Hinkin, 1998) were followed to create a behaviorally based scale measuring IL as defined by Randel and colleagues (2018). Further, we demonstrate convergent and discriminant validity evidence and good factor structure of the scale based on data collected from five distinct and heterogenous U.S. samples over the course of 4 years.

In Phase 2, path analysis results demonstrate support for IL’s direct effects on allyship and antiracism and indirect effects, through WGI and authenticity, on emotional exhaustion and empowerment. Contrary to expectations, authenticity did not mediate IL’s relationship with empowerment. Although the reason for this is not clear, it could be that IL behaviors in and of themselves are directly empowering (i.e., direct effect) and/or empowering through the impact on WGI, rather than authenticity. Indeed, if authenticity fosters social connection (Song et al., 2020), WGI (which also contains components of social connection) may be accounting for explanatory effects of IL on empowerment, rather than authenticity itself.

Moderation results generally supported IL is indeed beneficial for those in marginalized groups to the same/similar extent as those in socially dominant groups. Specifically, when exploring group differences in these relationships, results revealed various consistencies among those in marginalized vs. socially dominant groups. This implies that the predicted relationships may function similarly among those who belong to marginalized groups and those who do not, contributing to the generalizability of findings and possibly even representing the “curb-cut” effect (Blackwell, 2016), which is the idea that equity-related initiatives often have benefits to those beyond whom they were initially meant to support. Taking another point of view, the consistency of results demonstrates evidence that IL is not only benefiting those who are already systemically advantaged due to their racioethnicity and sex.

Notably, however, some differences were found between the groups, but often in ways that suggest IL may have a greater impact for those in marginalized groups. For instance, IL had a stronger relationship with authenticity for non-males than males, and, relatedly, the indirect effect of IL on emotional exhaustion through authenticity was stronger for non-males than males. Although the reasons for these differences warrant additional investigation, it may be plausible that males have less of a need for an inclusive leader for their experienced authenticity as traditional workplace norms may already tend to align with stereotypically masculine tendencies (see Cheryan & Markus, 2022) and, thus to some extent, their authentic displays of behavior. To the extent that these norms exist in workplaces, non-males may benefit more from IL behaviors to demonstrate that their authentic selves are welcome. Likewise, this authenticity enhanced by an inclusive leader should reduce the need for psychological demands (e.g., suppressing authentic displays of emotion) that lead to emotional exhaustion for this group.

Additionally, moderation results revealed the relationship between IL and antiracism was stronger for males than non-males. Although the reasons for this relationship is not clear, we consider whether IL may have a more of a motivational impact on males by signaling these antiracist behaviors are indeed expected by and supported for everyone. For instance, communal behaviors, like helping and socially oriented behaviors, tend to be regarded as more stereotypically female (e.g., Eagly & Steffen, 1984; Heilman & Chen, 2005). Thus, to the extent that antiracist behaviors are perceived as more communally oriented, IL behaviors may be particularly motivational for males by signaling that these behaviors should indeed be engaged by them as well. Although more research is needed to investigate this possibility, previous research has demonstrated male/female differences propensity to intervene in situations relating to sexual assault (e.g., Franklin et al., 2020), which may have parallels to other intervening behaviors, such as antiracist behaviors.

Theoretical Implications

Grounded in Ferdman’s (2014) multilevel framework and Randel et al.’s (2018) model of IL, our study supports ways in which IL influences workplace inclusion. Prior to this study, research has mainly focused on IL outcomes of group member experiences of well-being and performance. This left a gap in the current literature regarding whether IL fosters group member behaviors that uphold and reinforce workplace inclusion/DEI, thus co-creating inclusive workplaces. Indeed, Ferdman’s (2014) framework proposes the multi-directional influence of different levels of inclusion in the workplace, and this study takes an initial step in supporting that phenomenon. Specifically, we illustrate how IL can not only help employees feel included but can foster behaviors important for supporting equity and interrupting injustices (allyship and antiracism). These behaviors may support and uphold workplace inclusion to the extent that they weed out discriminatory behavior across the organization. In this way, IL goes beyond lip service for meeting DEI goals and becomes an evidence-based practice for enacting DEI in ways that likely have a more meaningful impact for those who often face the most barriers in the workplace.

We also add to Ferdman’s (2014) conceptual framework and Randel et al.’s (2018) model by integrating SIP and SLT (Bandura, 1977; Salancik & Pfeffer, 1978) to further explain how IL likely influences group member perceptions/experiences and behavior. Though many works have purported the importance of IL on driving inclusive workplaces, drawing from SIP and SLT, we drive the conversation forward by explaining how IL affects interpersonal behaviors and experiences. We then demonstrate supporting evidence for these relationships. In doing so, we partially support Randel et al.’s (2018) model and extend their seminal work by supporting IL—outcome relationships within (experienced inclusion, empowerment) and beyond (emotional exhaustion) what was proposed in their model. As previously discussed, emotional exhaustion may be especially relevant to the well-being of employees in marginalized groups. Beyond performance-related outcomes, employee well-being is valuable in its own right (see Tay et al., 2023, for compelling arguments that well-being is the ultimate criterion) and more organizations are heeding this view (SHRM, 2023). As work events continue to impact well-being (e.g., Office of the Surgeon General, 2022), research must continue to address ways to mitigate harmful experiences and facilitate positive experience—especially for groups that tend to face more harmful experiences—and our study shows fostering inclusion may be helpful to this end.

Finally, we add evidence to support IL and experienced inclusion can indeed impact employees who belong to marginalized groups. To move toward the goal of DEI, research must not be colorblind, ignoring differences among groups or treating differences as error (Salter & Haugen, 2017). Indeed, it is critical to demonstrate that inclusion-related efforts do not only benefit those who already benefit from systems of oppression. Though this is especially important for DEI-related research, examining whether relationships vary among various identity groups is also a vital way forward for psychological research in general, which has often neglected identity differences in the past (see Salter & Haugen, 2017).

Practical Implications

Our results suggest IL may be useful for driving inclusion in ways that matter for the inclusion of employees in marginalized groups: specifically, for those with marginalized racioethnicities and sex. Thus, leaders should strive to practice these behaviors in their everyday leadership. Indeed, by serving as role models and a valuable source of social information (Bandura, 1977; Thomas & Griffin, 1989), practicing these IL behaviors may have a domino effect on the behaviors and experiences of various employees and, ultimately, the workplace environment (Salancik & Pfeffer, 1978; Treviño & Nelson, 2017). At a higher level, HR and senior leadership play vital roles in supporting inclusion (Buengeler et al., 2018; Offermann & Basford, 2013; Sabharwal, 2014) and thus likely help establish inclusive leaders. From a strategic talent management perspective, HR and senior leaders may be able to support IL behaviors by explicitly linking them to organizational strategy (e.g., DeLong & Trautman, 2011; Silzer & Dowell, 2010), and embedding them in talent management practices (training, performance management) (e.g., Hayles, 2013; Offermann & Basford, 2013). Further, organizational practitioners can draw from this research to highlight the benefits of IL practices and get buy-in for IL development efforts.

Additionally, if organizations foster IL, they must ensure they have systems that appropriately support subsequent allyship and antiracist behavior. Specifically, organizations will need ways to mitigate unintended consequences, like retaliation, and unhelpful intervening behaviors, like White saviorism (see Stone-Sabali et al., 2023). For example, safe channels for reporting discrimination must be available. Indeed, Harvard Business Review Analytic Services (2021) demonstrates the potential value of this approach in creating inclusive organization. Further, literature on organizational contexts that mitigate retaliation in relation to reports of sexual harassment may also provide useful, parallel insights to this end (e.g., Bergman et al., 2002). Additionally, leaders may spend more time educating employees in socially dominant groups about forms of racism, such as how racism shows up subtly and overtly (see Hebl et al., 2020) and how and when to appropriately intervene (e.g., microinterventions, Sue et al., 2019).

Finally, we developed an IL scale based on Randel et al.’s (2018) theoretically grounded conceptualization and provide evidence that this scale may be promising for future use. This brief, 10-item IL scale may be especially practical for organizations intending to institutionalize IL behaviors. The brevity of the scale may allow it to be easily integrated into assessments of other leadership behaviors deemed strategically important for an organization. For instance, organizations may choose to embed this 10-item IL scale into a current upward- or 360-degree-feedback assessment as a means for diagnosing areas for leader improvement. Organizations may then reinforce IL behaviors by recognizing/rewarding leaders for having favorable assessment results and holding leaders accountable for unfavorable results. In turn, through assessing IL and subsequent reinforcement, organizations can take steps to institutionalize IL behavior.

Limitations and Future Directions

Several limitations and future directions should be noted. First, the methodology of the focal study leaves several areas for improvement. For example, the self-reported nature of the surveys may be subject to socially desirable responding. Future research should use multi-source data to determine if behaviors are truly enacted. As another example, the focal study data were collected with 1 week between each wave. Although temporal separation is helpful for reducing common method bias (Podsakoff et al., 2003) and for the theoretical direction of the relationships (Mitchell & James, 2001), examining these relationships over different lengths of time may reveal important evidence regarding causal relationships. While this research has temporal separation by 1 week, this represents the potential simplest theoretical relationship where other configurations may provide a better understanding of theoretical relationship, such as allowing for cyclical recursive causation (Configuration 6; Michell & James, 2001). Further, the current research design does not allow for the detection of causal effects as there was no manipulation nor elimination of confounding variables; therefore, future research may be able to explore these potential effects as causal claims with appropriate theory and research design (e.g., Antonakis et al., 2010). Additionally, longer time frames would allow for the examination of model change, including stability or autoregressive effects (Adachi & Willoughby, 2015), cross-lagged effects (Zyphur et al., 2020), and latent growth modeling (Chan, 1998). Future research should consider examining these relationships utilizing this methodology in order to address potential confounds (e.g., influences of other leadership styles, group-level diversity, team cohesion) as well as potential reverse causality effects (e.g., antiracist behavior’s bottom-up influence on inclusive leadership). Further, this research examines the current phenomenon in a single-level analysis where multilevel data with individuals nested in workgroups under different leaders would provide more clarity regarding the effects of IL on overall work group inclusion and antiracism/allyship behaviors. Future research should consider examining the effects of IL on outcomes relevant to other levels of Ferdman’s (2014) framework (e.g., workgroup, organization) to further understand the multilevel nature of IL. Additionally, the number of participants in each wave was not large enough to compare sex and racioethnic identities with greater specificity or other important identities. Future research should attempt to replicate these findings with additional important identities (e.g., marginalized groups stereotyped as “model minorities,” neurodivergent) and intersections to determine the extent to which IL is beneficial for fostering truly and ubiquitously inclusive workplaces.

Second, IL could also benefit from additional theoretical and empirical support. Although Nembhard and Edmondson are credited for introducing IL in 2006, IL as a construct is still relatively nascent and there are several questions surrounding conceptualization and theorization (Veli Korkmaz et al., 2022). Future research may benefit from examining IL behaviors beyond those covered by Randel et al. (2018) (e.g., sensemaking; Ferdman et al., 2021; sponsorship; Ibarra et al., 2010; skill utilization; Roberson & Perry, 2022) and by using phenomenological methodologies to discover behaviors that are most impactful for the inclusion of members of various marginalized groups (see Roberts et al., 2019). Future research should also investigate if IL behaviors differ at varying levels of leadership (e.g., senior leaders espousing inclusion-related values). Relatedly, our current measurement of IL focuses on a unidimensional construct where prior research may have suggested multiple dimensions (e.g., Randel et al., 2018; Veli Korkmaz et al., 2022). While our scale is meant to be a shortened measure of IL for practical reasons, we recognize the need for nuanced understandings of constructs. Future IL research may benefit from comparing our scale with more nuanced measures of the IL construct (e.g., Al-Atwi & Al-Hassani’s (2021) scale).

Research is also needed to determine whether IL is distinct from other established forms of leadership and whether it provides incremental value over other leadership styles. This is vital to mitigate construct proliferation and the jingle-jangle fallacy (Kelley, 1927). Although our measures development results provide evidence of some distinctions and there are compelling reasons to believe IL is conceptually distinct from other leadership styles (e.g., see Randel et al., 2018 [Table 1]), additional exploration is needed. Indeed, our results show the 10-item IL scale correlates strongly with other scales, suggesting IL may need greater refinement in conceptualization and/or measurement. Additionally, although Randel et al. (2018) describe that IL is different from other forms of leadership, like transformational leadership and servant leadership, we did not test empirical distinctness with these forms of leadership. Thus, future research should examine these relationships empirically to further disentangle IL from other leadership styles. For example, future research could examine IL’s effect on particular work outcomes, such as work group inclusion, while controlling for or accounting for other leadership styles to see IL’s unique effects. Additionally, future research could further disentangle IL from other leadership styles by using recent advances in discriminant validity analyses (e.g., Rönkkö & Cho, 2022). Moreover, IL operationalizations, including the present study, would benefit from critical analysis in light of recent leadership critiques, such as Fischer and Sitkin’s (2023) points regarding valence-based conflation (e.g., is IL simply tapping into positive leadership evaluations?) and causal indeterminacy (e.g., if employees feel included, is their leader inclusive? ). However, the present measure attempts to mitigate this issue by focusing on leader behavior, rather than follower perceptions of inclusion. Further, our results suggest this by providing support for the distinction between IL and WGI. However, future research could build upon these preliminary findings by using methodological techniques such as those mentioned previously (e.g., cross-lagged effects, recursive causation).

With regard to IL’s nomological network, antecedents should be explored to identify why and how IL manifests. Though some antecedents to IL have been proposed (e.g., pro-diversity beliefs, Randel et al., 2018), additional research is needed (Veli Korkmaz et al., 2022). For instance, environmental characteristics like accountability for inclusivity may be particularly valuable for the manifestation of these behaviors (Molefi et al., 2021). Further, although this study’s findings highlight important outcomes of IL, additional evidence is needed to explore IL’s impact on other DEI-related outcomes, for example, whether IL impacts more inclusion-related metrics (e.g., representativeness across organizational levels) and whether antiracist and allyship behaviors stemming from IL extend beyond the organization to the broader community in which employees are situated (e.g., Radke et al., 2020).

Finally, more research is needed to specify the boundary conditions of IL. For instance, one question is whether leader-follower similarities matter (e.g., if the leader-follower race/ethnicities are different, will IL behaviors be more influential for the employee’s experienced inclusion?). Additionally, some previous research has provided initial evidence suggesting inclusion can have negative implications. Specifically, Xiaotao and colleagues (2018) provide evidence of an inverted U-shaped effect of IL and task performance, arguing that when employees no longer feel jeopardized by exclusion, positive outcomes may decline. This phenomenon could be attributed to the “too much of a good thing” (TMGT) effect (i.e., supportive antecedents become unfavorable when taken to the extreme, Pierce & Agunis, 2013). Similarly, IL could also have adverse consequences on performance to the extent that it results in slower decision-making due to difficulty incorporating employees’ diverse views (Ames & Flynn, 2007; Bacon & Severson, 1986). Future research that applies a paradoxical lens to IL (see Ferdman, 2017) may be particularly valuable in identifying how to overcome potential challenges that arise when driving inclusion.

Conclusion

Although workplace inclusion is a strategic priority, achieving inclusion is an unmet goal for most organizations. However, the current study provides evidence that IL is an important strategy in achieving this goal. Results illustrate IL plays a role in positively predicting employee perceptions that are not only indicative of workplace inclusion, but also are particularly relevant to the experiences of employees in marginalized groups. Furthermore, IL positively predicted behaviors that likely help the establishment and maintenance of truly inclusive workplaces. Indeed, as Ferdman (2014, p. 16) states, “as more people and groups experience inclusion, they are more likely to have a shared sense of what it takes to create a more inclusion for themselves and others and to incorporate this learning into the ongoing processes and practices of the groups and organizations of which they are part.”