Advertisement

Diversity and interdisciplinarity: how can one distinguish and recombine disparity, variety, and balance?

  • Loet Leydesdorff
Open Access
Article
  • 155 Downloads

Abstract

The dilemma which remained unsolved using Rao-Stirling diversity, namely of how variety and balance can be combined into “dual concept diversity” (Stirling in SPRU electronic working paper series no. 28. http://www.sussex.ac.uk/Units/spru/publications/imprint/sewps/sewp28/sewp28.pdf, 1998, p. 48f.) can be clarified by using Nijssen et al.’s (Coenoses 13(1):33–38 1998) argument that the Gini coefficient is a perfect indicator of balance. However, the Gini coefficient is not an indicator of variety; this latter term can be operationalized independently as relative variety. The three components of diversity—variety, balance, and disparity—can thus be clearly distinguished and independently operationalized as measures varying between zero and one. The new diversity indicator ranges with more resolving power in the empirical case.

Keywords

Diversity Gini Measurement Rao-Stirling Balance 

Introduction

Rao-Stirling diversity is increasingly used as a measure of interdisciplinarity in bibliometrics (e.g., Rafols and Meyer 2010; Leydesdorff et al. 2017; cf. Zhou et al. 2012). In a brief communication entitled “The Repeat Rate: From Hirschman to Stirling,” Ronald Rousseau argues that this index (Rao 1982) or its monotone transformations (Zhang et al. 2016) includes the three aspects of variety, balance, and disparity as distinguished, for example, by Stirling (2007) and Rafols and Meyer (2010). Rao-Stirling diversity, however, is defined in terms of two factors, as follows:
$$\Delta = \mathop \sum \limits_{{\begin{array}{*{20}c} {i,j = 1} \\ {i \ne j} \\ \end{array} }}^{n} \left( {p_{i} p_{j} )(d_{ij} } \right)$$
(1)
where d ij is a disparity measure between two classes i and j, and p i is the proportion of elements assigned to each class i.

I added the brackets in Eq. (1) to show that Rao-Stirling diversity is composed of two factors: The right-hand factor operationalizes disparity; the left-hand one is also known as the Hirschman–Herfindahl or Simpson index.1 It seems to me that two factors cannot cover three concepts unless one uses two words for the same operationalization. However, one can argue that the left-hand term of Eq. (1) measures both variety and balance.

Rousseau et al. (1999) already addressed the issue when they formulated as follows (at p. 213):

It is generally agreed that diversity combines two aspects: species richness and evenness. Disagreement arises at how these two aspects should be combined, and how to measure this combination, which is then called “diversity”.

How and why are these two aspects of diversity compared and integrated in the left-hand term of Eq. (1)? Following Junge (1994), Stirling (1998, at p. 48) suggests labeling this integration as “dual concept diversity” and notes that “to many authorities in ecology, dual concept diversity is synonymous with diversity itself.”

Using Fig. 1, Stirling (1998) shows the possible dilemma when combining the two “subordinate properties” into a single “dual concept” when he formulates as follows at p. 48:
Fig. 1

The question of the relative priority assigned to variety and balance in dual concept diversity

Source Stirling (1998, at p. 49)

Where variety is held to be the most important property, System C might reasonably be held to be most (dual concept) diverse. Where a greater priority is attached to the evenness in the balance between options, System A might be ranked highest. In addition, there are a multitude of possible intermediate possibilities, such as System B.

Stirling (1998) then discusses at length the possibility to use the Simpson index (Simpson 1949) or Shannon-diversity (Shannon and Weaver 1949) for the measurement of “dual concept diversity” and concludes (on p. 57) that ‘there are good reasons to prefer the Shannon function as a robust general “non-parametric” measure of dual concept diversity’ (boldface and italics in the original.) Nevertheless, the Simpson index is most frequently used in the literature for this purpose (Stirling 2007).2

An alternative operationalization of diversity

In a study of the Lorenz curve as a graphical representation of “evenness” or “balance,” Nijssen et al. (1998) proved mathematically that both the Gini index and the coefficient of variation (that is, the standard deviation divided by the mean of the distribution or, in formula format, σ/μ) are perfect indicators of balance (Rousseau, personal communication, 16 March 2018). (The coefficient of variation is not bounded between zero and one.) Additionally, the Gini index is not a measure of variety (Rousseau 2018, p. 6).

Variety is the number of categories into which system elements are apportioned (Stirling 2007, p. 709), for example, the number of species (N) in an eco-system (MacArthur 1965). The problem with integrating this measure into an index of diversity might be that N is not bound between zero and one. I suggest solving this by using n/N, that is, the relative variety: n denotes the number of categories with values larger than zero, whereas N denotes the number of available categories. In the example which I will elaborate below, for example, among the 654 classes for patents in the so-called CPC classification, Amsterdam’s portfolio at the USPTO shows a value in 131 of them: the relative variety n/N is therefore 131/654 = 0.20.

In the discussion about related and unrelated variety, Frenken et al. (2007) proposed Shannon entropy as a measure of “unrelated variety.” As a measure of “related variety” these authors use Theil’s (1972) decomposition algorithm for appreciating the grouping (cf. Leydesdorff 1991). However, this measure assumes the ex ante definition of relevant groups. The disparity matrix operates in terms of ecological distances and is not based on such a priori assumptions about structure (Izsák and Papp 1995). In other words, relatedness is already covered by the term d ij in Eq. (1). Shannon entropy can be normalized relative to the maximum entropy and then varies between zero and one (or as percentage entropy). If one wishes to appreciate not only the number of categories but also the values, Shannon entropy could be an alternative for measuring variety. Grouping is not advised, because the disparity measure already covers the ecological distances that can indicate relatedness.

An empirical elaboration

If one wishes to consider the three aspects of diversity—variety, balance, and disparity—in a single measure equivalent to Rao-Stirling diversity, one thus can multiply the corresponding elements in the disparity matrix with the values of the Gini index and relative variety. All three factors are bounded between zero and one and are decomposable. (Note that the coefficient of variation is not bound between zero and one.) One thus obtains the following diversity measure for each unit of analysis (e.g., city) c:
$${\text{Div}}_{c} = \, \left( {n_{c} /N} \right)*{\text{Gini}}_{c} *\left[ { \, \mathop \sum \limits_{{\begin{array}{*{20}c} {i = 1,} \\ {j = 1,} \\ {i \ne j} \\ \end{array} }}^{{_{{\begin{array}{*{20}c} {} \\ {j = n_{c} } \\ {i = n_{c} } \\ \end{array} }} }} d_{ij} /\{ n_{c} * \, (n_{c} {-} \, 1)\} } \right]$$
The first term is the relative variety as defined above: the number of valued categories for this city (excluding zeros) divided by the total number of categories (that is in this case, 654; including zeros). The second term is the Gini coefficient of the vector of these nc categories, and the third weights the disparity as a measure for each observation permutating the cells i and j along the vector, but excluding the main diagonal.3 The normalization in the third component is needed for warranting that the disparity values (e.g., the Euclidean distance or (1—cosine)) function as weightings between zero and one. As in the case of Rao-Stirling diversity, the cosine-values are taken from the symmetrical cosine-matrix among the 654 column vectors of the asymmetrical matrix of 654 categories versus more than five million patents used by Leydesdorff et al. (2017).4

For the computation of the Gini coefficient, I follow Buchan’s (2002) simplification of the computation which the author formulated as follows:

The classical definition of G appears in the notation of the theory of relative mean difference:
$$G = \frac{{\sum\nolimits_{i = 1}^{n} {\sum\nolimits_{j = 1}^{n} {|x_{i} - X_{j} |} } }}{{2n^{2} \bar{x}}}$$
(2)
where x is an observed value, n is the number of values observed and x bar is the mean value.
If the x values are first placed in ascending order, such that each x has rank i, some of the comparisons above can be avoided and computation is quicker:
$$G = \frac{2}{{n^{2} \bar{x}}}\sum\limits_{i = 1}^{n} {i(x_{i} - \bar{x})}$$
$$G = \frac{{\sum\nolimits_{i = 1}^{n} {(2i - n - 1)x_{i} } }}{{n\sum\nolimits_{i = 1}^{n} {x_{i} } }}$$
where x is an observed value, n is the number of values observed and i is the rank of values in ascending order.
In the following example from Leydesdorff et al. (2017), disparity is measured as (1—cosine) between each two distributions (Jaffe 1989). In this study we compared 20 cities (four cities each in five countries) in terms of the Rao-Stirling diversity of their patent portfolios operationalized as patents granted by the USPTO in 2016. The results are provided in Table 5 (at p. 1584) of that study and compared here below in Table 1 with the values for the new indicator in the right-hand column.
Table 1

Rank-ordered list of twenty cities in terms of the diversity of patent portfolios granted at the USPTO in 2016

Source of the left-hand column: Leydesdorff et al. (2017, Table 5 at p. 1584)

City

Rao

City

Diversity

Paris

0.83

Shanghai

0.74

Boston

0.80

Beijing

0.71

Rotterdam

0.80

Paris

0.62

Jerusalem

0.79

Atlanta

0.61

Atlanta

0.78

Boulder

0.52

Eindhoven

0.78

Boston

0.49

Nanjing

0.78

Berkeley

0.45

Berkeley

0.78

Telaviv

0.42

Shanghai

0.78

Eindhoven

0.41

Boulder

0.78

Haifa

0.36

Beersheva

0.78

Grenoble

0.33

Amsterdam

0.76

Jerusalem

0.29

Beijing

0.71

Toulouse

0.27

Toulouse

0.71

Amsterdam

0.25

Telaviv

0.71

Nanjing

0.23

Marseille

0.70

Rotterdam

0.15

Haifa

0.69

Beersheva

0.12

Grenoble

0.69

Dalian

0.10

Dalian

0.69

Wageningen

0.09

Wageningen

0.50

Marseille

0.03

Whereas the left-hand ranking is counter-intuitive in placing Rotterdam and Jerusalem above, for example, Shanghai and Beijing, these latter two cities are attributed the highest rankings using the new indicator. Furthermore, the Rao-Stirling diversity ranges from 0.50 (Wageningen) to 0.83 (Paris), whereas the new diversity index ranges from 0.03 (Marseille) to 0.74 (Shanghai). Figure 2 shows these ranges graphically. The new diversity measure has a stronger resolving power than Rao-Stirling diversity.
Fig. 2

Rao-Stirling diversity and the diversity measure proposed here for the patent portfolios of twenty cities in terms of the CPC classification for patents granted at the USPTO in 2016

The cities under study were chosen so that one could expect differences among them; however, these were smaller than expected using Rao-Stirling diversity. For example, Boston and Rotterdam had the same value on this indicator. Using the new diversity measure, however, the diversity of the portfolio of Boston is more than three times higher than that of Rotterdam.

Table 2 provides the relevant correlations: Spearman’s rank-order correlations are shown in the upper triangle and Pearson correlations on the basis of comparing among these twenty cities in the lower triangle. As could be expected, Rao-Stirling diversity correlates with the Simpson index and Shannon diversity, but not with the Gini coefficient.5 The new diversity measure is not significantly correlated with Rao-Stirling diversity or the Simpson index, but—not surprisingly—with the Gini coefficient and with variety; these two factors are constitutive for the diversity in this approach in addition to the disparity.
Table 2

Pearson correlation coefficients in the lower triangle and Spearman’s rank-order correlations in the upper triangle

 

Rao-Stirling

Diversity

Gini

Variety

Simpson

Shannon

Rao-Stirling

 

0.438

− 0.084

0.470*

0.874**

0.893**

Diversity

0.417

 

0.747**

0.997**

0.416

0.589**

Gini

− 0.078

0.765**

 

0.721**

− 0.092

0.060

Variety

0.492*

0.992**

0.714**

 

0.443

0.623**

Simpson

0.896**

0.346

− 0.114

0.412

 

0.925**

Shannon

0.890**

0.600**

0.184

0.684**

0.835**

 

**Correlation is significant at the 0.01 level (2-tailed)

*Correlation is significant at the 0.05 level (2-tailed)

Conclusions and discussion

The dilemma which remained unsolved using Rao-Stirling diversity, namely of how variety and balance can be combined into “dual concept diversity” (Stirling 1998, p. 48f.), can be clarified using Nijssen et al.’s (1998) argument that the Gini coefficient is a perfect indicator of balance. Since the Gini coefficient is not an indicator of variety; this latter term can be operationalized as relative variety and thus be bounded between zero and one. The three components of diversity—variety, balance, and disparity—can thus be clearly distinguished and independently operationalized as measures varying between zero and one. The new diversity indicator ranges with more resolving power in the empirical case. However, the new diversity indicator did not correlate with Rao-Stirling diversity.

I don’t want to argue for this diversity measure beyond the status of another indicator. Unlike the confusion hitherto, however, the new indicator is based on the solution made possible by Nijssen et al.’s (1998) proof and Stirling’s (1998) analysis of the literature. The independent operationalization of the three aspects of diversity distinguished by Stirling (1998, 2007) provides a more reliable ground than “dual” or higher-order concepts. A routine is provided at http://www.leydesdorff.net/software/diverse for computing both Rao-Stirling diversity and this new indicator (see the Appendix).

The diversity issue is important for the measurement of interdisciplinarity and knowledge integration in science and technology studies. However, the further elaboration of this relevance requires yet another discussion (e.g., Wagner et al. 2011). In Leydesdorff et al. (2018), for example, we argued that a high diversity—measured as Rao-Stirling diversity—in citing patterns may indicate esoteric originality at the journal level and perhaps trans-disciplinarity more than knowledge integration. Uzzi et al. (2013), however, considered atypical combinations in citing behavior at the paper level on the contrary as an indication of novelty.

Footnotes

  1. 1.

    \(\mathop \sum \limits_{ij} p_{i} p_{j} = 1\) when taken over all i and j. The Simpson index is equal to Σ i (p i ) 2 , and the Gini-Simpson to [1 − Σ i (p i ) 2 ].

  2. 2.

    Hill (1973) derived that the two indicators can be considered as variants of a general formalization. See Stirling (1998, at p. 49f) for the elaboration.

  3. 3.

    If one wished, one could replace the variety measure with the Shannon function.

  4. 4.

    A routine for the computation can be found at http://www.leydesdorff.net/software/diverse (see  the Appendix).

  5. 5.

    As can be expected, the coefficient of variation correlated significantly with the Gini coefficient: both Spearman’s rank-order correlation and the Pearson correlation are .94 (p < .01; n = 20).

Notes

Acknowledgement

I thank Ronald Rousseau for comments and stimulating discussions about previous versions of this communication.

References

  1. Buchan, I. (2002). Calculating the Gini coefficient of inequality. https://www.nibhi.org.uk/Training/Statistics/Gini%20coefficient.doc.
  2. Frenken, K., Van Oort, F., & Verburg, T. (2007). Related variety, unrelated variety and regional economic growth. Regional Studies, 41(5), 685–697.CrossRefGoogle Scholar
  3. Hill, M. O. (1973). Diversity and evenness: A unifying notation and its consequences. Ecology, 54(2), 427–432.CrossRefGoogle Scholar
  4. Izsák, J., & Papp, L. (1995). Application of the quadratic entropy indices for diversity studies of drosophilid assemblages. Environmental and Ecological Statistics, 2(3), 213–224.CrossRefGoogle Scholar
  5. Jaffe, A. B. (1989). Characterizing the “technological position” of firms, with application to quantifying technological opportunity and research spillovers. Research Policy, 18(2), 87–97.CrossRefGoogle Scholar
  6. Junge, K. (1994). Diversity of ideas about diversity measurement. Scandinavian Journal of Psychology, 35(1), 16–26.CrossRefGoogle Scholar
  7. Leydesdorff, L. (1991). The static and dynamic analysis of network data using information theory. Social Networks, 13(4), 301–345.CrossRefGoogle Scholar
  8. Leydesdorff, L., Kogler, D. F., & Yan, B. (2017). Mapping patent classifications: portfolio and statistical analysis, and the comparison of strengths and weaknesses. Scientometrics, 112(3), 1573–1591.CrossRefGoogle Scholar
  9. Leydesdorff, L., Wagner, C. S., & Bornmann, L. (2018). Betweenness and diversity in journal citation networks as measures of interdisciplinarity–A tribute to Eugene Garfield. Scientometrics, 114(2), 567–592.  https://doi.org/10.1007/s11192-017-2528-2.CrossRefGoogle Scholar
  10. MacArthur, R. H. (1965). Patterns of species diversity. Biological Reviews, 40(4), 510–533.CrossRefGoogle Scholar
  11. Nijssen, D., Rousseau, R., & Van Hecke, P. (1998). The Lorenz curve: A graphical representation of evenness. Coenoses, 13(1), 33–38.Google Scholar
  12. Rafols, I., & Meyer, M. (2010). Diversity and network coherence as indicators of interdisciplinarity: Case studies in bionanoscience. Scientometrics, 82(2), 263–287.CrossRefGoogle Scholar
  13. Rao, C. R. (1982). Diversity: Its measurement, decomposition, apportionment and analysis. Sankhy: The Indian Journal of Statistics, Series A, 44(1), 1–22.MathSciNetMATHGoogle Scholar
  14. Rousseau, R. (2018). The repeat rate: From hirschman to stirling. Under submission.Google Scholar
  15. Rousseau, R., Van Hecke, P., Nijssen, D., & Bogaert, J. (1999). The relationship between diversity profiles, evenness and species richness based on partial ordering. Environmental and Ecological Statistics, 6(2), 211–223.CrossRefGoogle Scholar
  16. Shannon, C. E., & Weaver, W. (1949). The mathematical theory of communication. Urbana: University of Illinois Press.MATHGoogle Scholar
  17. Simpson, E. H. (1949). Measurement of diversity. Nature, 163(4148), 688.CrossRefMATHGoogle Scholar
  18. Stirling, A. (1998). On the economics and analysis of diversity. SPRU Electronic Working Paper Series No. 28. http://www.sussex.ac.uk/Units/spru/publications/imprint/sewps/sewp28/sewp28.pdf.
  19. Stirling, A. (2007). A general framework for analysing diversity in science, technology and society. Journal of the Royal Society, Interface, 4(15), 707–719.CrossRefGoogle Scholar
  20. Theil, H. (1972). Statistical decomposition analysis. Amsterdam: North-Holland.MATHGoogle Scholar
  21. Uzzi, B., Mukherjee, S., Stringer, M., & Jones, B. (2013). Atypical combinations and scientific impact. Science, 342(6157), 468–472.CrossRefGoogle Scholar
  22. Wagner, C. S., Roessner, J. D., Bobb, K., Klein, J. T., Boyack, K. W., Keyton, J., et al. (2011). Approaches to understanding and measuring interdisciplinary scientific research (IDR): A review of the literature. Journal of Informetrics, 5(1), 14–26.CrossRefGoogle Scholar
  23. Zhang, L., Rousseau, R., & Glänzel, W. (2016). Diversity of references as an indicator for interdisciplinarity of journals: Taking similarity between subject fields into account. Journal of the Association for Information Science and Technology, 67(5), 1257–1265.  https://doi.org/10.1002/asi.23487.CrossRefGoogle Scholar
  24. Zhou, Q., Rousseau, R., Yang, L., Yue, T., & Yang, G. (2012). A general framework for describing diversity within systems and similarity between systems with applications in informetrics. Scientometrics, 93(3), 787–812.CrossRefGoogle Scholar

Copyright information

© The Author(s) 2018

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  1. 1.Amsterdam School of Communication Research (ASCoR)University of AmsterdamAmsterdamThe Netherlands

Personalised recommendations