Organizational researchers are paying increasing attention to the phenomenon of disparity, which refers to the extent to which a valued resource is concentrated in a few individuals (Blau, 1977; Harrison & Klein, 2007). Some of this work has focused on the conceptualization and measurement of disparity (Allison, 1978; Atkinson, 1970; Dawson, 2011; Harrison & Klein, 2007; Kokko, Mackenzie, Reynolds, Lindström, & Sutherland, 1999). Other empirical work has examined the impacts, on outcomes, of various types of disparity, including pay disparity (Fredrickson, Davis-Blake, & Sanders, 2010; Gupta, Conroy, & Delery, 2012; Messersmith, Guthrie, Ji, & Lee, 2011; Siegel & Hambrick, 2005; Trevor, Reilly, & Gerhart, 2012), power disparity (Curşeu & Sari, 2013; Greer & van Kleef, 2010; Ronay, Greenaway, Anicich, & Galinsky, 2012; Smith, Houghton, Hood, & Ryman, 2006), status disparity (Christie & Barling, 2010), contribution-based disparity (Daniel, Agarwal, & Stewart, 2013), subgroup disparity (Carton & Cummings, 2012), structural hole disparity (Rowley, Greve, Rao, Baum, & Shipilov, 2005), cognitive disparity (Curşeu, Schruijer, & Boroş, 2007), and knowledge disparity (Han, Han, & Brass, 2014).

In doing so, researchers have operationalized disparity in various ways, such as the coefficient of variation (CV), Gini coefficient, standard deviation (SD), Theil index, and relative mean deviation (RMD, also known as Schutz’s coefficient). These five disparity measures are all widely used in the fields of economics and sociology (Allison, 1978; Atkinson, 1970; Besley & Burgess, 2003; Cowell, 2011; Lin & Huang, 2011), but only CV, Gini (e.g., Carnahan, Agarwal, & Campbell, 2012; Yanadori & Cui, 2013), and SD (e.g., Greer & van Kleef, 2010; Trevor et al., 2012) are widely used by organizational disparity researchers. Because there are very limited criteria for disparity measure selection, researchers usually choose disparity measures on the basis of their familiarity or convenience (Allison, 1978). Furthermore, it is possible that researchers may choose different disparity measures because these measures are relatively highly correlated (e.g., .97 for Gini and RMD, .80 for Gini and Theil, and .83 for Theil and RMD: Lin & Huang, 2011; .75 for CV and Gini: Shaw, Gupta, & Delery, 2002; .70–.74 for CV/Gini and SD: Trevor et al., 2012).

Because CV, Gini, SD, Theil, and RMD have different properties (Cowell, 2011; Harrison & Sin, 2006; Kokko et al., 1999), however, our concern is that using them interchangeably may lead to inconsistent findings. For example, using the same data set, Curşeu et al. (2007) drew different conclusions on the basis of CV and Gini, and thus have questioned the parallelism of these disparity measures. More recently, a review of the pay disparity literature reported that pay disparity and performance could be unrelated, positively related, or negatively related (Trevor et al., 2012, pp. 587–588). Thus, examining possible reasons for these inconsistent conclusions and, on the basis of these explanations, offering appropriate suggestions about measure selection will be useful for future disparity research. Although some economists and sociologists (e.g., Allison, 1978) have offered several suggestions regarding disparity measure selection, these suggestions have been from a macro-level perspective, which may be not suitable for organizational disparity research.

In the present research, we focus on the measurement of organizational disparity as a possible reason for inconsistencies in the disparity literature, examine the similarities and potential differences among the available measures, and offer suggestions for measure selection for organizational disparity research. Within the context of pay disparities, we conducted two simulation studies. The first was designed to examine the empirical relations among CV, Gini, SD, Theil, and RMD, and the second was designed to examine the similarities and differences among CV–outcome relations, Gini–outcome relations, SD–outcome relations, Theil–outcome relations, and RMD–outcome relations.

Literature review

Properties and measures of organizational disparity

In this article, we treat disparity and inequality as equivalent terms, and will use only disparity to ensure writing uniformity. As we noted earlier, disparity refers to the extent to which a valued resource is concentrated in a few individuals (Blau, 1977; Harrison & Klein, 2007). In economics, the Lorenz curve is “the gold standard” for the extent of concentration (Wolfson, 1994). If we rank individual incomes from lowest to highest, and use the cumulative proportion of the population as the horizontal axis and the cumulative proportion of income as the vertical axis, we can get the Lorenz curve, shown in Fig. 1 (Lines A, B, and C), which depicts social wealth or income distribution. If every person has the same income, the cumulative proportion of the population equals the cumulative proportion of income, which is the situation of perfect equality (Line A). However, if the poorest 80 % of the population earns only 35 % of the total income (Point p in Fig. 1), disparity clearly exists. The most famous disparity measure, the Gini coefficient, is “equal to twice the area between the Lorenz curve and the line of perfect equality” (Allison, 1978, p. 872). Because the area between Line C and Line A in Fig. 1 is bigger than that between Line B and Line A, the degree of disparity is greater in the former than in the latter situation.

Fig. 1
figure 1

Lorenz curves for three disparity situations

In order to assess the degree of disparity, various measures are used by economists, sociologists, and management scholars. Specifically, CV, Gini, SD, Theil, and RMD are the most widely used measures of disparity (computational formulas for these measures are shown in Table 1; Allison, 1978; Atkinson, 1970; Besley & Burgess, 2003; Cowell, 2011; Lin & Huang, 2011). In general, disparity has the following six properties. Thus, measures of disparity should meet these properties.

Table 1 Measures of disparity

First, the minimum disparity score is zero; this occurs when all members in one unit have the same resource (Champernowne, 1974). For example, if every team member’s income or power is equal, there is no income or power disparity in this team. As is shown in Table 1, all five measures meet this property.

Second, as is also shown in Table 1, the maximum disparity score occurs when one member has all the resources and all other members have nothing (Harrison & Klein, 2007; Solanas, Selvam, Navarro, & Leiva, 2012). CV, Gini, Theil, and RMD all meet this property, but SD does not. Instead, the maximum SD exists when members are evenly distributed at the two ends of possible values of a specific construct (Harrison & Klein, 2007; Harrison & Sin, 2006). For example, Greer and van Kleef (2010) used a 5-point Likert scale to measure the power of group members (e.g., 0 powerlessness to 4 extremely powerful) and used SD to measure the power disparity. The maximum SD value in a eight-person team existed when four team members scored 0 and the other four members scored 4 (as is shown in Table 1). In this case, the value of SD would be 2 [i.e., (4 – 0)/2], which is greater than the situation when one member scored 4 and the other seven members scored 0 (SD = 1.32). However, the CV (1), Gini (.50), Theil (0.69), and RMD (.50) values in the team with four powerful and four powerless members are much lower than those in the team with one powerful and seven powerless members (CV = 2.65, Gini = .88, Theil = 2.08, and RMD = .88).

Third, disparity is scale invariant; that is, it should not be impacted by the scale on which the variables are measured (Allison, 1978; Sørensen, 2002). For example, team pay disparity should be unchanged, regardless of whether the team members’ pay is measured in dollars (x) or cents (100x). CV, Gini, Theil, and RMD all meet this property, but SD does not. To illustrate, consider the pay disparity in a five-member team ($1,000, $2,000, $3,000, $4,000, $5,000). If every team member’s pay were doubled, the team SD would be doubled (from 1,414.21 to 2,828.43); however, the values of CV (0.47), Gini (.27), Theil (0.12), and RMD (.20) would remain the same.

Fourth, disparity is an asymmetric construct (Harrison & Klein, 2007). Again, CV, Gini, Theil, and RMD all meet this property, but SD does not. To illustrate, consider the distribution of pay in two symmetric five-member teams: Team A ($1,000, $1,000, $1,000, $1,000, $9,000) and Team B ($9,000, $9,000, $9,000, $9,000, $1,000). CV, Gini, Theil, and RMD are all asymmetric measures; thus, their values will be quite different. Team A has more poor employees and its disparity is higher (CV = 1.23, Gini = .49, Theil = 0.57, and RMD = .49), whereas Team B has more rich employees and its disparity is lower (CV = 0.43, Gini = .17, Theil = 0.13, and RMD = .17). Because SD is a symmetric measure, however, the teams will have identical SD values (3,200).

Fifth, disparity should decrease if the specific resource is transferred from a higher-resource person to a lower-resource person; this is called the principle of transfers (Allison, 1978; Dalton, 1920). To illustrate, consider a four-member team in which the team members’ pay is distributed as follows: $1,000 (member A), $2,000 (member B), $3,000 (member C), and $4,000 (member D). If $800 is transferred from member D (a higher-pay team member) to member A (a lower-pay team member), the pay disparity in this team should decrease. This process could be illustrated by the change in the area between the Lorenz curve and the line of perfect equality (Fig. 2). The lower curve is the Lorenz curve before transferring, and the upper curve is the Lorenz curve after transferring. Because the area becomes smaller after transferring, the pay disparity decreases. Dalton (1920) and Allison (1978) have shown that CV, Gini, SD, and Theil meet the principle of transfers. However, RMD is not “affected by transfers between persons who are both below the mean or both above it” (Allison, 1978, p. 868); thus, it does not always meet this property.

Fig. 2
figure 2

Example of the principle of transfer in which specific pay is transferred from a higher-paid member to a lower-paid member in a four-member team

Sixth, if k teams, each containing n team members and having the same distribution of pay, are aggregated into a single unit (population), then the single-unit disparity of kn team members is the same in each of the constituent teams; this is called the principle of population replication (Amiel & Cowell, 1992; Shorrocks, 1980). All five of the measures meet this property. To illustrate, consider two three-member teams with the same pay distribution: Team A ($2,000, $3,000, $5,000) and Team B ($2,000, $3,000, $5,000). If these two teams are aggregated into a single team with six members—Team C ($2,000, $2,000, $3,000, $3,000, $5,000, $5,000), the pay disparity in Team C is the same as it had been in Team A or Team B (CV = 0.37, Gini = .20, SD = 1,247.22, Theil = 0.07, and RMD = .17).

We agree with some researchers’ propositions that there is no single “best” measure of disparity (Champernowne, 1974; Ray & Singer, 1973). Although CV, Gini, and Theil meet all of the properties of disparity, they all have some weaknesses. First, as is shown in Table 1, the maximum values of CV, Gini, and Theil are \( \sqrt{n-1} \), 1 – 1/n, and ln(n), respectively, which all are affected by the unit size (n) (Solanas et al., 2012). For example, for a three-person team, the maximum values of CV, Gini, and Theil are \( \sqrt{2} \), 2/3, and ln(3), respectively. However, for a ten-person team, the maximum values of CV, Gini, and Theil are 3, 0.9, and ln(10), respectively. Researchers, depending on their different theoretical concerns, might want a disparity measure that increases with n, decreases with n, or is insensitive to changes in unit size (Ray & Singer, 1973). Thus, CV, Gini, and Theil might be not suitable for some research situations.

Second, some researchers have suggested that different disparity measures are suitable for different distributions of variables (Allison, 1978; Braun, 1988). For example, it has been suggested that Theil is more suitable for pay distribution with diminishing marginal utility, Gini is more sensitive to changes in middle-income groups rather than in either low-income or rich groups, and CV is more suitable for variables “where utility is neither strictly increasing nor especially relevant” (Allison, 1978, p. 877), such as age. These suggestions, however, are made from macro-level perspectives, which may be not be suitable for micro-level organizational disparity research. In micro-level research, the unit size is much smaller (e.g., 5–15 persons in a micro-level team) than in macro-level research (e.g., 1,000,000 persons in a macro-level province). This makes the variable distribution much less important in a micro unit (Harrison & Sin, 2006). Indeed, some micro-level organizational researchers have found that CV, Gini, and Theil were highly correlated and that results based on different measures were robust (Christie & Barling, 2014; Fredrickson et al., 2010; He & Huang, 2011; Onaran, 1992; Trevor & Wazeter, 2006). One of the possible reasons for these high correlations among the different disparity measures is that CV, Gini, and Theil all represent dispersion divided by the mean (Allison, 1978), and some of them could be expressed by a general formula:

$$ D=\frac{{\left(\frac{1}{2{n}^2}{\displaystyle {\sum}_{i=1}^n{\sum}_{j=1}^n\left|{x}_i-{x}_j\right|{}^r}\right)}^{\frac{1}{r}}}{u}, $$

where x i and x j are the members’ scores (e.g., pay) in a unit; u is the average value of the unit members’ scores; and n is the number of unit members (unit size). When r = 1, D is the Gini measure. When r = 2, D is the CV measure.

Furthermore, although SD and RMD do not meet some properties of disparity and may not qualify as legitimate measures of disparity, using them to measure disparity is not uncommon in organizational research and other social science fields. Possibly, this is due to the facts that SD/RMD and CV/Gini/Theil are very highly related (Shaw, 2014; Trevor et al., 2012) and that, in some studies, the relations between these different measures and outcomes either have not differed (Carnahan et al., 2012; Grund & Westergaard-Nielsen, 2008; Trevor et al., 2012) or have differed very little (Halevy, Chou, Galinsky, & Murnighan, 2012, p. 404).

It is not always the case, however, that different “disparity” measures yield consistent results (Curşeu et al., 2007). Indeed, many inconsistent results from pay disparity research have been reported in empirical studies (Bloom, 1999; Curşeu et al., 2007; Shaw et al., 2002) and in a recent review conducted by Trevor et al. (2012, pp. 587–588). Although there may be many reasons why different researchers have arrived at different conclusions, measurement problems have often led to difficulties in interpreting the results of field research.

Overview of disparity measures in organizational research

The purpose of the present study is to examine the similarities, and potential differences, among the various disparity measures used in organizational research. Specifically, we need to identify the correlations between different disparity measures, to identify the research contexts (e.g., sample characteristics) when various disparity measures are different, to identify which measures are inappropriate for assessing organizational disparity, and to identify the impact of using an inappropriate measure. In order to better understand the relevant literature on the concerns above, we reviewed empirical disparity studies published in the management and psychology fields from 1992 to 2013. We conducted electronic searches in the following databases: ABI/Inform, EBSCOhost, PsycInfo, Elsevier Science Direct, JSTOR, and Thomson Reuters Web of Science. The keywords used to search these databases included disparity, inequality, dispersion, coefficient of variation, Gini, standard deviation, Theil, and relative mean deviation/Schutz’s coefficient in conjunction with group, team, firm, or organization. We only focused on studies that used a particular measure (CV, Gini, SD, Theil, or RMD) to operationalize the corresponding disparity construct. We excluded studies that used the ratio (highest/lowest) measure, because this measure loses critical information about the resource distribution across the entire team (Cowell, 2011; Ray & Singer, 1973; Smith et al., 2006). We also excluded the gap measure, which refers to the difference in resources between the leader and the average resource of its other members, because it ignores resource variation across nonleader members. On the basis of these criteria, we included a total of 42 empirical organizational disparity studies in our review.

Types of disparity

Table 2 summarizes the key elements of these studies. As is shown, pay disparity is the most commonly examined form of disparity, although, recently, researchers have begun to focus on disparity based on other variables, such as status (Bendersky & Shah, 2013; Christie & Barling, 2010) and power (Greer & van Kleef, 2010; Smith et al., 2006).

Table 2 Key elements of organizational disparity studies

Measures application

CV and Gini are clearly the most widely used disparity measures in organizational research. Seven studies have used SD to assess either power disparity (Greer & van Kleef, 2010) or pay disparity (Canal Domínguez & Gutiérrez, 2004; Carnahan et al., 2012; Grund & Westergaard-Nielsen, 2008; Halevy et al., 2012; Mahy, Rycx, & Volral, 2011; Trevor et al., 2012). Much less common in this organizational literature is the use of the Theil or RMD measure; as Table 2 shows, these have been used in only two studies (He & Huang, 2011; Onaran, 1992).

Correlations between various measures

Four studies have reported the correlation coefficients between CV and Gini; these ranged from .71 to .99 (Bloom, 1999; He & Huang, 2011; Shaw et al., 2002; Trevor & Wazeter, 2006). Two studies have reported that the correlations between CV and SD ranged from .70 to .86 (Meslec & Curşeu, 2013; Trevor et al., 2012). One study showed that Gini, Theil, and RMD were highly related (>.95; He & Huang, 2011). Furthermore, on the basis of the original data reported in Onaran’s (1992) study, we calculated the correlation between Gini and Theil and found that it was very high (.91). However, because the evidence regarding the correlations among different disparity measures is very limited, we do not know whether high correlations are universal.

Consistency of the results based on different measures

Eight studies have used both CV and Gini (Bloom, 1999; Curşeu et al., 2007; Fredrickson et al., 2010; Grund & Westergaard-Nielsen, 2008; He & Huang, 2011; Shaw et al., 2002; Trevor et al., 2012; Trevor & Wazeter, 2006), and three of them pointed out that the results between CV/Gini and the outcome variables were equivalent (Fredrickson et al., 2010; Grund & Westergaard-Nielsen, 2008; Trevor et al., 2012). However, another three studies reported inconsistent conclusions based on the same data assessed with CV and Gini (Bloom, 1999, p. 32; Curşeu et al., 2007, p. 197; Shaw et al., 2002, p. 503). Three studies reported that using SD, instead of CV or Gini, as the disparity measure did not change the relations between disparity and outcomes (Carnahan et al., 2012; Grund & Westergaard-Nielsen, 2008; Trevor et al., 2012). Since the remaining studies did not report these data, it is difficult to draw strong conclusions regarding this issue.

Research contexts

In considering the studies in Table 2, it should be recognized that they have very different sample characteristics (or “research contexts”). For example, the mean values of unit disparity were quite different, ranging from 0.05 to 41.09 for CV, from .02 to .60 for Gini, and from 0.05 to 0.19 for Theil. In addition, the studies vary considerably in terms of the number of sample units (teams or firms), which ranged from 9 to 87,000, and of the nature of these units (e.g., top management teams vs. basketball teams). Because sample characteristics may cause differences in team or firm structure and human resource systems (e.g., pay gap, which refers to the pay ratio of the highest- to the lowest-paid employees), it is essential to examine their potential effects on research conclusions. A more detailed discussion on research context selection (e.g., sample distribution, the number of units, unit size and pay gap ) is presented in the Method section.

At this point, there seems to be some uncertainty within the research literature as to whether different disparity measures (i.e., CV, Gini, SD, Theil, and RMD) are interchangeable (Study 1). If they are not interchangeable, what is the impact of using an inappropriate measure (Study 2)? Furthermore, it is not clear whether sample and unit characteristics can affect the impact that misusing disparity measures might have (Studies 1 and 2). Thus, in the present research we examined these issues using computer simulations.

Study 1

Study 1 focused on the relations among CV, Gini, SD, Theil, and RMD. Although previous studies have found that CV, Gini, SD, Theil, and RMD are strongly related, the overall body of evidence has been fairly limited and involved few research contexts. Thus, it is unclear how well these relations generalize across sample distributions, numbers of units, and variable ranges. If there are no significant differences among CV, Gini, SD, Theil, and RMD, researchers could interchange them with little consequence. Alternatively, if the correlations are weak under some conditions, this suggests that some measures might not be suitable for assessing disparity.

Method

In our simulation, the sample refers to a certain number of individuals randomly selected from a population. In each simulated sample, the randomly generated individuals belonged to a specific number of teams (e.g., 50), and each team was composed of a specific number of team members (e.g., 5–15). Similar to previous research (Allen, Stanley, Williams, & Ross, 2007; Biemann & Kearney, 2010), the simulations involved the following three steps. First, for each sample with a specific context (i.e., combination of a distribution, specific “pay gap” and number of teams), we randomly generated 1,000 samples using the Visual Basic for Applications (VBA) program. Second, on the basis of these data, we calculated CV, Gini, SD, Theil, and RMD for each team in each sample. Third, in each sample, we calculated the correlation coefficients among these five indexes in each sample and examined whether the mean values of these correlations differed. Pay data were randomly generated on the basis of a two-side truncated distribution (Robert, 1995), which is a conditional distribution to which the domain of pay data is restricted (e.g., 1 < pay < 10). The reason for the restricted domain of the pay data is that we needed to control the effects of pay gap (see the following section).

Because the sampling correlation coefficients are not normally distributed, researchers have suggested using Fisher’s z transformation to avoid underestimation of the population correlation (average correlation coefficients; Silver & Dunlap, 1987). In the present simulations, we converted the (Pearson or Spearman) correlation coefficient in each simulated sample to Fisher’s z transformation, averaged the zs, and transformed them back to the (Pearson or Spearman) correlation coefficient.

Sample distributions

Disparity is an asymmetric phenomenon (Harrison & Sin, 2006); thus, the sample distributions are not always normal distributions. For example, if one considers power or pay, team leaders are typically a few dominant individuals who have more power or higher salaries, whereas other team members have less power or lower salaries (Smith et al., 2006). In these circumstances, it is difficult to guarantee a sample’s normality. Thus, two types of distributions were considered in the present simulations: a normal pay distribution and a skewed pay distribution (log-normal distribution).

Range of pay gap

Pay gap refers to the pay ratio of the highest- to the lowest-paid employees within each team. In order to justify the pay gap ranges that we incorporated into our simulations, we collected top management team (TMT) data as a reference. Specifically, we examined data from the 2012 annual reports of public companies published on the Yahoo Finance website (http://finance.yahoo.com/). As is shown in Table 3, pay gaps within TMTs differed across countries. Overall, the pay gap ratios drawn from the TMTs of 3,929 firms in 11 different countries ranged from 1:1 to 2,746:1, with a mean of 8.61:1. Because every public company only disclosed the pay of its CEO and the four other highest-paid executives, the actual pay gap may well be higher than these data suggest. Thus, in order to simplify the simulations, we set three ranges of pay data to control the gap—10:1, 100:1, and 1,000:1; these correspond to a low, moderate, and high pay gap, respectively. Through these controls, we could examine the effects of pay gap on the intercorrelations among CV, Gini, SD, Theil, and RMD.

Table 3 Pay gaps within top management teams in public companies in various countries (2012)

Sample size

The sample size, in this context, is the number of teams in our randomly generated sample. As is shown in Table 2, the sample sizes used in organizational disparity studies vary considerably. The median of sample sizes was 268, and the percentages of ≤50, ≤100, ≤200, ≤500, ≤1,000, and >1,000 teams were 11.4 %, 13.6 %, 18.2 %, 18.2 %, 9.1 %, and 29.5 %, respectively. In order to examine the effect of sample size on the relations among the various disparity measures, the simulations incorporated various typical sample sizes shown in Table 2 (specifically: 30, 50, 100, 200, 500, and 1,000 teams).

Consider, for example, a situation with a normal distribution, a low pay gap (10:1) and a sample size of 30. First, members’ pay data in the 30 teams were generated by the VBA program that we developed, such that the pay data satisfied a two-sided truncated normal distribution and the pay ratio of highest- to lowest-paid members was 10:1. Each team was composed of from five to 15 members (Allen et al., 2007). This was in accordance with the typical team size shown in Table 2 (where the mean size is 7.69 per team). Next, 30 team pay disparity scores were calculated using the formulas associated with each of the five disparity measures (CV, Gini, SD, Theil, and RMD). After obtaining these data, we calculated correlation coefficients between every pair of the five disparity measures and transformed them to Fisher’s z. Finally, we repeated the above process 1,000 times, averaged the zs, and transformed them back to correlation coefficients.

Results

Correlations among CV, Gini, Theil, and RMD

As is shown in Fig. 3A–F, when dealing with normal distributions, the 95 % confidence intervals (CIs) of correlations between CV and Gini, CV and Theil, CV and RMD, Gini and Theil, Gini and RMD, and Theil and RMD ranged from .97 to 1.00, .94 to .99, .92 to .99, .93 to .99, .94 to .99, and .89 to .98, respectively. With skewed distributions, the 95 % CIs of correlations between CV and Gini, CV and Theil, CV and RMD, Gini and Theil, Gini and RMD, and Theil and RMD ranged from .86 to .99, .93 to .99, .76 to .98, .91 to .99, .90 to .99, and .88 to .98, respectively. Furthermore, the 95 % CIs of correlations between these measures became more and more narrow as sample size increased, indicating that large samples could increase the degrees of stability of these correlations.

Fig. 3
figure 3figure 3

Correlations among CV, Gini, SD, Theil, and RMD. Note: The x-axis represents different sample sizes, and the y-axis represents mean correlations. Each shaped point represents the correlation mean, and the whiskers around each mean encompass the middle 95 % of the range of correlations used to calculate each correlation mean. Different-shaped points represents different ranges of pay gap

Except for the average correlations observed between CV and RMD under the skewed sample distribution situation (.89–.94), all the other average correlations between these measures exceeded .94. Interestingly, as is shown in Fig. 3B, D, and F, the sample distribution had little effect on the average correlations between Theil and CV/Gini/RMD, which suggests that Theil has the most stable relationship with the other three measures in different distributions. For CV, Gini, and RMD, the average correlations between any two of them were a little lower in the skewed than in the normal distribution situation. Pay gap had little effect on the average correlations between these measures. As is shown in Fig. 3A–F, the correlations decreased as pay gap increased, especially in skewed distribution situations. Within a given combination of sample distribution and pay gap (e.g., normal/10:1, skewed/100:1); however, there were almost no differences in the average correlations across the different sample sizes.

The results above are based on Pearson correlation coefficients. We also did these simulations with Spearman correlations and found very similar results. In our simulations, Gini had the highest average correlations (Pearson = .971, Spearman = .970) with the other three measures in all situations. Next was Theil (Pearson = .966, Spearman = .967), and then CV (Pearson = .964, Spearman = .967) and RMD (Pearson = .955, Spearman = .953). Overall, these results show that CV, Gini, Theil, and RMD are strongly related in most cases.

Correlations between CV/Gini/Theil/RMD and SD

A somewhat different picture emerges when considering correlations between CV/Gini/Theil/RMD and SD. As is shown in Fig. 3G–J, when dealing with normal distributions, the 95 % CIs of correlations between CV and SD, Gini and SD, Theil and SD, and RMD and SD ranged from –.35 to .99, .04 to .98, .10 to .97, and .21 to .96, respectively. With skewed distributions, the 95 % CIs of correlations between CV and SD, Gini and SD, Theil and SD, and RMD and SD ranged from –.65 to .99, –.37 to .98, –.40 to .98, and –.23 to .96, respectively.

The average correlations between CV and SD, Gini and SD, Theil and SD, and RMD and SD were not very high, ranging from .44 to .86. Furthermore, pay gap had a significant effect on the average correlations between SD and other disparity measures. The correlations decreased significantly as pay gap increased, especially in skewed distribution situations. For example, the average correlations between SD and other disparity measures were all lower than .61 in the situation with a high pay gap (1,000:1) and a skewed distributed sample.

The results above are based on Pearson correlation coefficients. We also did these simulations with Spearman correlations and, again, found very similar results. For example, in all situations, the average Pearson and Spearman correlations between SD and CV, SD and Gini, SD and Theil, and SD and RMD were .730 and .724, .741 and .735, .738 and .763, and .727 and .721, respectively.

Overall, these results suggest that the relations between CV/Gini/Theil/RMD and SD are quite different—and much weaker—than the relations among CV, Gini, Theil, and RMD.

Effects of sample disparities on the relations between SD and other measures

Because previous studies have examined samples with quite different team disparity means (e.g., from .02 to .60 for average team Gini; see Table 2), we also calculated sample disparity (i.e., the mean values of all team Gini scores in each simulated sample). This allowed us to examine the effects of sample disparity on the relations between SD and other measures. In order to examine these effects, we begin with the following example. Consider a set of 1,000 samples, each with a normal distribution and a low pay gap (10:1). Shown in Fig. 4 are 1,000 points; these represent Fisher’s z transformation of the correlation coefficients between CV and SD in each of the 1,000 simulated samples. The x-axis represents the sample disparities, which were the mean values of all team Gini scores in each simulated sample. The y-axis represents Fisher’s z transformation of the correlation coefficients between team pay CV and team pay SD in each simulated sample. As can be seen, there is a strong negative relationship (r = –.98; see the fourth column of Table 4) between sample disparity and Fisher’s z transformation of the correlation between CV and SD. In order to save space, we will only provide a brief description of the final relations in different situations, as are shown in Table 4.

Fig. 4
figure 4

One example of relations between the sample disparity mean and Fisher’s z transformation of correlation coefficients between SD and CV (normal distribution, pay gap = 10:1, and sample size = 1,000). Note: The x-axis represents sample disparities, which were the mean values of all team Gini scores in each simulated sample. The y-axis represents the Fisher’s z value of the correlation coefficient between team pay SD and team pay CV in each simulated sample

Table 4 Correlations between sample disparity mean and Fisher’s z transformation of the correlation coefficients of SD with other measures

Table 4 shows correlations between the sample disparity mean and Fisher’s z transformation of the correlation between SD and other disparity measures. As sample disparity increased, z decreased, especially in situations with normal distribution and large sample sizes. Consider, for example, a situation with a low pay gap (10:1) and a normally distributed sample. The correlation between the sample disparity mean and Fisher’s z transformation of the correlation between CV and SD was –.88 in the small-sample-size (30) situations, whereas it was –.98 in the large-sample-size (1,000) situations.

On the whole, from Figs. 3 and 4 and Table 4, we can see that CV, Gini, Theil, and RMD are strongly related (correlations between CV and RMD were relatively lower than the others), unless samples are within a seriously skewed distribution, sample pay gaps are very wide, and sample disparity is very high. Under the latter conditions, however, SD is not strongly related with any one of CV, Gini, Theil, or RMD.

Study 2

Although our literature review showed that RMD does not always satisfy the transfer principle, the results of Study 1 indicated that RMD is highly correlated with CV, Gini, and Theil, which satisfy all of the properties of disparity. This suggests that researchers using RMD to assess disparity may draw conclusions similar to those from researchers using CV, Gini, and Theil. Furthermore, our literature review also showed that SD did not satisfy some basic properties of disparity. The results of Study 1 indicate that CV/Gini/Theil/RMD and SD are not always strongly correlated under some contexts and, as such, this evidence suggests that SD may not be a valid disparity measure. At this point, it is worth asking what the consequences are if researchers use SD to assess disparity. And within which contexts will these consequences be more or less serious? Some empirical studies have reported that the results based on different disparity measures are different (Bloom, 1999, p. 32; Curşeu et al., 2007, p. 197; Shaw et al., 2002, p. 503). Thus, it is necessary to examine whether inconsistent conclusions based on different measures are common. Study 2 was designed to address these questions. Specifically, we conducted statistical simulations to compare the extents to which CV, Gini, SD, Theil, and RMD predicted key outcome variables.

Method

The scores of outcome variables were generated by the VBA program to control the correlations between specific team disparity measure and outcome variables. First, consistent with the procedures followed in Study 1, we randomly generated 1,000 samples with specific distributions, specific sample sizes, and specific pay gaps. Second, we calculated the CV, Gini, SD, Theil, and RMD for each team in each sample. Third, we randomly generated outcome variables based on specific correlation coefficients between pay Gini and the outcome variables. We chose Gini as a reference measure because it had the highest correlation with the other disparity measures (we also used CV, Theil, or RMD as a reference measure and found that the results were similar to those with Gini). As in previous simulation research (Steel & Kammeyer-Mueller, 2002), three correlations were selected; these were .10, .30, and .50. Fourth, on the basis of the generated outcome-variable data, we calculated correlations between CV and the outcome variables, SD and the outcome variables, Theil and the outcome variables, and RMD and the outcome variables. Consistent with Study 1, we did a Fisher’s z transformation and a back-transformation. Finally, these results were compared with the parallel Gini–outcome correlations.

Consider, for example, a true correlation (e.g., r = .50) between pay disparity (predictor) and an outcome variable (criterion). In the situation with a sample size of 200, we randomly generated corresponding pay data (e.g., normal distribution and pay gap = 10:1) for the members of the 200 teams. Next, 200 team pay disparity scores (predictors) were calculated on the basis of the formula for each disparity measure (CV, Gini, SD, Theil, and RMD). Working from the assumption that Gini is the “correct” measure of disparity, 200 outcome scores (criterion) were generated by the VBA program that we developed, such that the correlation between the predictor (Gini) and the criterion variable was r = .50. After obtaining the generated outcome scores, we then calculated correlation coefficients between the other disparity measures (CV, SD, Theil, and RMD) and the outcome variable in each simulated sample and transformed them to Fisher’s z. Finally, we repeated the above process 1,000 times, averaged the zs, and transformed them back to correlation coefficients.

Results

Differences between Gini–outcome relations and CV/Theil/RMD–outcome relations

As is shown in Figs. 5, 6, and 7, when the correlation between Gini and the outcome variable was .10, .30, or .50, the average correlations between CV/Theil/RMD and their respective outcome variables approximated .10, .30, or .50 for normal distributions and were slightly lower than .10, .30, or .50 for skewed distributions. In addition, the 95 % CIs of the correlations became more and more narrow as sample size increased, indicating that large samples could increase the degrees of stability of these correlations. Although the range whiskers varied as the correlation between Gini and an outcome variable increased, these changes were slight. This pattern suggests that the results of using CV, Theil, or RMD to measure disparity were almost the same as those from using Gini to measure disparity, although for skewed distributions, the relations changed a little.

Fig. 5
figure 5

Correlations between CV and an outcome variable when correlations between Gini and the outcome variable were .10, .30, and .50, respectively. Note: The x-axes represent different sample sizes, and the y-axes represent mean correlations. Each shaped point represents the correlation mean, and different-shaped points represent different ranges of pay gap. The whiskers around each mean encompass the middle 95 % of the range of correlations used to calculate each correlation mean. The dotted lines indicate the correlation (.10, .30, or .50) between Gini and the outcome variable. For each sample size and the same-shaped points, the lower point represents the mean correlation between CV and the outcome variable when the correlation between Gini and the outcome variable was .10; the middle point represents the mean correlation between CV and the outcome variable when the correlation between Gini and the outcome variable was .30; and the higher point represents the mean correlation between CV and the outcome variable when the correlation between Gini and the outcome variable was .50

Fig. 6
figure 6

Correlations between Theil and an outcome variable when correlations between Gini and the outcome variable were .10, .30, and .50, respectively. Note: The x-axes represent different sample sizes. The y-axes represent mean correlations. Each shaped point represents the correlation mean, and different-shaped points represent different ranges of pay gap. The whiskers around each mean encompass the middle 95 % of the range of correlations used to calculate each correlation mean. The dotted lines indicate the correlation (.10, .30, or .50) between Gini and the outcome variable. For each sample size and the same-shaped points, the lower point represents the mean correlation between Theil and the outcome variable when the correlation between Gini and the outcome variable was .10; the middle point represents the mean correlation between Theil and the outcome variable when the correlation between Gini and the outcome variable was .30; and the higher point represents the mean correlation between Theil and the outcome variable when the correlation between Gini and the outcome variable was .50

Fig. 7
figure 7

Correlations between RMD and an outcome variable when correlations between Gini and the outcome variable were .10, .30, and .50, respectively. Note: The x-axes represent different sample sizes. The y-axes represent mean correlations. Each shaped point represents the correlation mean, and different-shaped points represent different ranges of pay gap. The whiskers around each mean encompass the middle 95 % of the range of correlations used to calculate each correlation mean. The dotted lines indicate the correlation (.10, .30, or .50) between Gini and the outcome variable. For each sample size and the same-shaped points, the lower point represents the mean correlation between RMD and the outcome variable when the correlation between Gini and the outcome variable was .10; the middle point represents the mean correlation between RMD and the outcome variable when the correlation between Gini and the outcome variable was .30; and the higher point represents the mean correlation between RMD and the outcome variable when the correlation between Gini and the outcome variable was .50

Differences between Gini–outcome relations and SD–outcome relations

As is shown in Fig. 8, in some specific situations (i.e., skewed distribution and relatively small sample size), the correlations between Gini and outcome variables and the correlations between SD and outcome variables can be opposite in sign. Although the average correlations between SD and outcome variables were similar with the same sample distribution and pay gap but different sample sizes, the 95 % CIs of the correlations became more and more narrow as sample size increased, indicating that larger samples could increase the degrees of stability of these correlations.

Fig. 8
figure 8

Correlations between SD and an outcome variable when correlations between Gini and the outcome variable were .10, .30, and .50, respectively. Note: The x-axes represent different sample sizes. The y-axes represent mean correlations. Each shaped point represents the correlation mean, and different-shaped points represent different ranges of pay gap. The whiskers around each mean encompass the middle 95 % of the range of correlations used to calculate each correlation mean. The dotted lines indicate the correlation (.10, .30, or .50) between Gini and the outcome variable. For each sample size and the same-shaped points, the lower point represents the mean correlation between SD and the outcome variable when the correlation between Gini and the outcome variable was .10; the middle point represents the mean correlation between SD and the outcome variable when the correlation between Gini and the outcome variable was .30; and the higher point represents the mean correlation between SD and the outcome variable when the correlation between Gini and the outcome variable was .50

Consider, for example, a correlation of r = .30. As is shown in Fig. 8, when the correlation between Gini and an outcome variable was .30, the sample size was 100, and the pay gap was 100:1, the average correlations between SD and the same outcome variable were .22 for normal distributions and .18 for skewed distributions, indicating that the underestimation effects caused by using SD to measure disparity was more obvious in samples with skewed distributions. Furthermore, the average correlations between SD and the outcome variable decreased as the pay gap increased. These results suggest that when using SD to replace Gini, the mean correlations between corresponding predictor and criterion variables were underestimated, especially when the pay distribution was skewed and the pay gap was large.

Interestingly, as can be seen, the range whiskers became increasingly wider as the correlation between Gini and the outcome variable increases. Consider, for example, a low correlation between Gini and the outcome variable (e.g., r = .10). In normally distributed samples, with a sample size of 200 and a sample pay gap of 10:1, the 95 % CI of Fisher’s z transformation of correlations between SD and the outcome ranged from –.00 to .16. However, with the same sample characteristics and a high correlation between Gini and the outcome variable (r = .50), the 95 % CI of Fisher’s z transformation of the correlations between SD and the outcome ranged from .32 to .55. The interval value in the low-correlation situation was .16, whereas it was .23 in the high-correlation situation. This suggests that when the true correlation between disparity and an outcome is high, conclusions drawn when SD is used to measure disparity might become more and more unstable.

General discussion

To our knowledge, the present study is the first quantitative study to examine the differences among different disparity measures. In economics and sociology research, CV, Gini, SD, Theil, and RMD are the most widely used measures of disparity (Allison, 1978; Besley & Burgess, 2003), but it is a little different in organizational research. Organizational researchers examining disparity typically use CV, Gini, and SD rather than either of the other two measures. In what follows, and on the basis of the studies above, we offer suggestions about how disparity in organizational research should be assessed.

CV, Gini, Theil, and RMD

Consistent with most previous studies from various fields (Bendel, Higgins, Teberg, & Pyke, 1989; Christie & Barling, 2014; Fredrickson et al., 2010; He & Huang, 2011; Kawachi & Kennedy, 1997; Trevor & Wazeter, 2006), our simulation results showed that CV, Gini, Theil, and RMD are strongly related (Study 1) and that the correlates of any two of these four measures are very similar (Study 2). Although Allison (1978) did not suggest using RMD to assess disparity because it does not always satisfy the transfer principle, Study 1 showed that RMD was highly correlated with CV, Gini, and Theil, especially in normal distribution. Furthermore, Study 2 showed that the average correlation between RMD and criterion variables was close to the correlation between the true measure and these criterion variables.

Other studies, however, have reported relations between some disparity measures (e.g., CV and Gini) that are fairly moderate (e.g., .71, Bloom, 1999; .75, Shaw et al., 2002), and much lower than those observed in our simulation research. Possible reasons for this discrepancy are as follows. First, Bloom did not use original pay data; instead, he used the logarithm of pay, which could have increased the sample pay gap. On the basis of our simulations, the average correlations between any two of CV, Gini, Theil, and RMD decreased as the pay gap increased. With respect to this, it is worth noting that the mean Gini coefficient in Bloom’s research was .60, which represents a very high value in our Table 2. Our simulation also showed that the relations between CV and Gini decreased as sample disparity increased. Second, in the Shaw et al. study, the authors mentioned that “it was practically impossible to obtain a complete pay distribution from each organization,” and thus, they “estimated annual pay for . . . the three categories (minimum, middle, and maximum)” and estimated “the number of individuals reported to be making near the minimum, middle, and highest pay levels” on the basis of managers’ reports (Shaw et al., 2002, pp. 500–501). These estimated values might have affected the sample characteristics (e.g., distribution and pay gap), which, as our data suggest, affect the relations between any two of these measures.

Suggestion 1: Researchers should use untransformed, accurate, and raw data to assess organizational disparity. Logarithmic transformation or estimated limited category data may change the sample distribution, sample disparity, and sample pay gap, which in turn affect the correlations between any two of CV, Gini, Theil, and RMD and the consistencies of conclusions based on CV, Gini, Theil, and RMD.

Interestingly, Curşeu et al. (2007) reported that the effects of CV and Gini on criterion variables were different. They suggested that these differences raised the question of whether the Gini measure was appropriate for operationalizing disparity. Similar questions could be found in other empirical studies (Bloom, 1999, p. 32; Shaw et al., 2002, p. 503). Our simulation results suggest, however, that at least for normal distributions and log-normal distributions, researchers need not worry too much about this issue. It is also important to note that the sample size in the Curşeu et al. study was relatively small (44 groups). On the basis of our simulations, it appears that the correlations between any two of CV, Gini, Theil, and RMD are more unstable in small than in large samples; thus, a low correlation is more likely to be observed with a small sample size. We suggest that researchers report the correlation between these measures when they do disparity research with a small sample size. If the correlation between these measures is relatively low, or the conclusions based on these measures are inconsistent, this may indicate that the sample size and/or distribution may be causing problems. At this point, increasing the sample size will be helpful. On the one hand, as is shown in Fig. 3, increasing the sample size can decrease the probability of low correlations. On the other hand, increasing the sample size may bring sample distributions closer to the population distribution.

Suggestion 2: Considering that CV, Gini, Theil, and RMD are highly correlated, researchers examining organizational disparity research should use one measure in their primary study and other measures as the basis of a robustness test. If the results are different, some problems may exist that could be addressed by increasing the sample size.

Is SD a valid disparity measure?

Our simulation results suggest that researchers should not use SD to assess disparity. SD is not an asymmetric and scale-invariant construct. Furthermore, the maximum condition of SD is different than those of CV, Gini, Theil, and RMD. Although correlations between CV/Gini/Theil/RMD and SD decreased as sample disparity increased, CV/Gini/Theil/RMD and SD can be positively related, negatively related, or unrelated, depending on the sample distribution, sample disparity, pay gap, and sample size. For example, when samples were normally distributed, large, and involved narrow pay gaps, the mean correlations between CV/Gini/Theil/RMD and SD were relatively high (around .80). In contrast, with skewed distributions, small sample sizes, and wide pay gaps, correlations between CV/Gini/Theil/RMD and SD could be zero, or even negative. These results suggest that researchers should not rely on the results of the few previous studies with high correlation coefficients between CV/Gini/Theil/RMD and SD as their rationale for using SD to measure disparity.

Our simulation results also indicate that using SD as a disparity measure would underestimate the true relations between disparity and the outcome variables, especially in samples with a skewed distribution and wide pay gap. Although larger samples could increase the degree of stability of SD and outcome variables, they would not eliminate the underestimation effects. Using SD as a disparity measure might lead to quite different conclusions, which in turn increases the probabilities of Type II error.

Suggestion 3: SD is not a valid disparity measure. Although SD is highly related to CV, Gini, Theil, and RMD in some conditions, it cannot satisfy the properties of maximum, scale invariance, and asymmetry. Using SD to represent disparity considerably underestimates the mean correlations between predictor and outcome variables.

Furthermore, some researchers have pointed out that the CV measure (the SD divided by the mean) is an interaction effect between the SD and the inverse of the mean (Harrison & Klein, 2007; Sørensen, 2002). Thus, they suggested using the following regression model to examine the above interactive effect:

$$ \mathrm{Y}={\beta}_0+{\beta}_1SD+{\beta}_2\left(1/\mathrm{Mean}\right)+{\beta}_3\left(SD/\mathrm{Mean}\right)+\varepsilon . $$

However, on the basis of the results of our simulations, SD and CV are highly related in many situations, and this will create a serious multicollinearity problem and may lead to a serious computational problem. This problem has been reported in some empirical studies (e.g., Meslec & Curşeu, 2013). Our solution for this problem is that the interactive term should be a product of the centered SD and centered 1/Mean, rather than SD/Mean. Thus, if researchers want to investigate the interactive effect of the SD and the inverse of the mean, we suggest that they use the following regression model to avoid the multicollinearity problem:

$$ \mathrm{Y}={\beta}_0+{\beta}_1\left(\mathrm{centered}\;SD\right)+{\beta}_2\left(\mathrm{centered}\;1/\mathrm{Mean}\right)+{\beta}_3\left(\mathrm{centered}\;SD\times \mathrm{centered}\;1/\mathrm{Mean}\right)+\varepsilon . $$

Limitations and future directions

There are some limitations to our study. First, we did not focus on the disparity measures for nonratio variables. Some researchers have suggested that CV, Gini, Theil, and RMD should only be applied to ratio-level variables (Allison, 1978; Bedeian & Mossholder, 2000; Harrison & Klein, 2007). If researchers apply CV, Gini, Theil, and RMD to data based on nonratio scales, they may draw erroneous conclusions. Consider, for example, an interval-scale variable situation. Suppose that there are two four-person teams—Team A (10, 10, 16, 16) and Team B (10, 10, 11, 16). We calculate disparity scores and find that Team A (CV = 0.23, Gini = .12, Theil = 0.03, and RMD = .12) is more unequal than Team B (CV = 0.21, Gini = .10, Theil = 0.02 and RMD = .09). If we change the zero-point by subtracting 10 (variable values thus become 0, 0, 6, and 6 for Team A, and 0, 0, 1, and 6 for Team B), however, we find that Team A (CV = 1.00, Gini = .50, Theil = 0.69, and RMD = .50) is more equal than Team B (CV = 1.42, Gini = .68, Theil = 0.98, and RMD = .61). The conclusions for interval disparity are thus very different on the basis of two different zero-point situations. Thus, future study will need to explore new measures for nonratio variables.

Second, we focused exclusively on measures of objective disparity, and neglected members’ subjective perceptions of unit disparity, which might be related to, but different from, objective disparity. Indeed, disparity may matter only if members recognize it; thus, perceived disparity might moderate the relations between objective disparity and outcomes. Capturing these mechanisms would require further study and some exploration of the best way to measure subjective perceptions of disparity.