# Diversity and interdisciplinarity: how can one distinguish and recombine disparity, variety, and balance?

- 155 Downloads

## Abstract

The dilemma which remained unsolved using Rao-Stirling diversity, namely of how variety and balance can be combined into “dual concept diversity” (Stirling in SPRU electronic working paper series no. 28. http://www.sussex.ac.uk/Units/spru/publications/imprint/sewps/sewp28/sewp28.pdf, 1998, p. 48f.) can be clarified by using Nijssen et al.’s (Coenoses 13(1):33–38 1998) argument that the Gini coefficient is a perfect indicator of balance. However, the Gini coefficient is not an indicator of variety; this latter term can be operationalized independently as relative variety. The three components of diversity—variety, balance, and disparity—can thus be clearly distinguished and independently operationalized as measures varying between zero and one. The new diversity indicator ranges with more resolving power in the empirical case.

## Keywords

Diversity Gini Measurement Rao-Stirling Balance## Introduction

*three*aspects of variety, balance, and disparity as distinguished, for example, by Stirling (2007) and Rafols and Meyer (2010). Rao-Stirling diversity, however, is defined in terms of two factors, as follows:

*d*

_{ ij }is a disparity measure between two classes

*i*and

*j*, and

*p*

_{ i }is the proportion of elements assigned to each class

*i*.

I added the brackets in Eq. (1) to show that Rao-Stirling diversity is composed of two factors: The right-hand factor operationalizes disparity; the left-hand one is also known as the Hirschman–Herfindahl or Simpson index.^{1} It seems to me that two factors cannot cover three concepts unless one uses two words for the same operationalization. However, one can argue that the left-hand term of Eq. (1) measures both variety and balance.

It is generally agreed that diversity combines two aspects: species richness and evenness. Disagreement arises at how these two aspects should be combined, and how to measure this combination, which is then called “diversity”.

How and why are these two aspects of diversity compared and integrated in the left-hand term of Eq. (1)? Following Junge (1994), Stirling (1998, at p. 48) suggests labeling this integration as “dual concept diversity” and notes that “to many authorities in ecology, dual concept diversity is synonymous with diversity itself.”

Where variety is held to be the most important property, System C might reasonably be held to be most (dual concept) diverse. Where a greater priority is attached to the evenness in the balance between options, System A might be ranked highest. In addition, there are a multitude of possible intermediate possibilities, such as System B.

Stirling (1998) then discusses at length the possibility to use the Simpson index (Simpson 1949) or Shannon-diversity (Shannon and Weaver 1949) for the measurement of “dual concept diversity” and concludes (on p. 57) that ‘* there are good reasons to prefer the Shannon function as a robust general “non-parametric” measure of dual concept diversity’* (boldface and italics in the original.) Nevertheless, the Simpson index is most frequently used in the literature for this purpose (Stirling 2007).

^{2}

## An alternative operationalization of diversity

In a study of the Lorenz curve as a graphical representation of “evenness” or “balance,” Nijssen et al. (1998) proved mathematically that both the Gini index and the coefficient of variation (that is, the standard deviation divided by the mean of the distribution or, in formula format, *σ*/*μ*) are perfect indicators of balance (Rousseau, personal communication, 16 March 2018). (The coefficient of variation is not bounded between zero and one.) Additionally, the Gini index is *not* a measure of variety (Rousseau 2018, p. 6).

*Variety* is the number of categories into which system elements are apportioned (Stirling 2007, p. 709), for example, the number of species (*N*) in an eco-system (MacArthur 1965). The problem with integrating this measure into an index of diversity might be that *N* is not bound between zero and one. I suggest solving this by using *n*/*N*, that is, the relative variety: *n* denotes the number of categories with values larger than zero, whereas *N* denotes the number of available categories. In the example which I will elaborate below, for example, among the 654 classes for patents in the so-called CPC classification, Amsterdam’s portfolio at the USPTO shows a value in 131 of them: the relative variety *n*/*N* is therefore 131/654 = 0.20.

In the discussion about related and unrelated variety, Frenken et al. (2007) proposed Shannon entropy as a measure of “unrelated variety.” As a measure of “related variety” these authors use Theil’s (1972) decomposition algorithm for appreciating the grouping (cf. Leydesdorff 1991). However, this measure assumes the *ex ante* definition of relevant groups. The disparity matrix operates in terms of ecological distances and is not based on such a priori assumptions about structure (Izsák and Papp 1995). In other words, relatedness is already covered by the term *d*_{ ij } in Eq. (1). Shannon entropy can be normalized relative to the maximum entropy and then varies between zero and one (or as percentage entropy). If one wishes to appreciate not only the number of categories but also the values, Shannon entropy could be an alternative for measuring variety. Grouping is not advised, because the disparity measure already covers the ecological distances that can indicate relatedness.

## An empirical elaboration

*c*:

*n*

_{c}categories, and the third weights the disparity as a measure for each observation permutating the cells

*i*and

*j*along the vector, but excluding the main diagonal.

^{3}The normalization in the third component is needed for warranting that the disparity values (e.g., the Euclidean distance or (1—

*cosine*)) function as weightings between zero and one. As in the case of Rao-Stirling diversity, the cosine-values are taken from the symmetrical cosine-matrix among the 654 column vectors of the asymmetrical matrix of 654 categories versus more than five million patents used by Leydesdorff et al. (2017).

^{4}

For the computation of the Gini coefficient, I follow Buchan’s (2002) simplification of the computation which the author formulated as follows:

*x*is an observed value,

*n*is the number of values observed and

*x bar*is the mean value.

*x*values are first placed in ascending order, such that each

*x*has rank

*i*, some of the comparisons above can be avoided and computation is quicker:

*x*is an observed value,

*n*is the number of values observed and

*i*is the rank of values in ascending order.

*cosine*) between each two distributions (Jaffe 1989). In this study we compared 20 cities (four cities each in five countries) in terms of the Rao-Stirling diversity of their patent portfolios operationalized as patents granted by the USPTO in 2016. The results are provided in Table 5 (at p. 1584) of that study and compared here below in Table 1 with the values for the new indicator in the right-hand column.

Rank-ordered list of twenty cities in terms of the diversity of patent portfolios granted at the USPTO in 2016

*Source* of the left-hand column: Leydesdorff et al. (2017, Table 5 at p. 1584)

City | Rao | City | Diversity |
---|---|---|---|

Paris | 0.83 | Shanghai | 0.74 |

Boston | 0.80 | Beijing | 0.71 |

Rotterdam | 0.80 | Paris | 0.62 |

Jerusalem | 0.79 | Atlanta | 0.61 |

Atlanta | 0.78 | Boulder | 0.52 |

Eindhoven | 0.78 | Boston | 0.49 |

Nanjing | 0.78 | Berkeley | 0.45 |

Berkeley | 0.78 | Telaviv | 0.42 |

Shanghai | 0.78 | Eindhoven | 0.41 |

Boulder | 0.78 | Haifa | 0.36 |

Beersheva | 0.78 | Grenoble | 0.33 |

Amsterdam | 0.76 | Jerusalem | 0.29 |

Beijing | 0.71 | Toulouse | 0.27 |

Toulouse | 0.71 | Amsterdam | 0.25 |

Telaviv | 0.71 | Nanjing | 0.23 |

Marseille | 0.70 | Rotterdam | 0.15 |

Haifa | 0.69 | Beersheva | 0.12 |

Grenoble | 0.69 | Dalian | 0.10 |

Dalian | 0.69 | Wageningen | 0.09 |

Wageningen | 0.50 | Marseille | 0.03 |

The cities under study were chosen so that one could expect differences among them; however, these were smaller than expected using Rao-Stirling diversity. For example, Boston and Rotterdam had the same value on this indicator. Using the new diversity measure, however, the diversity of the portfolio of Boston is more than three times higher than that of Rotterdam.

^{5}The new diversity measure is

*not*significantly correlated with Rao-Stirling diversity or the Simpson index, but—not surprisingly—with the Gini coefficient and with variety; these two factors are constitutive for the diversity in this approach in addition to the disparity.

Pearson correlation coefficients in the lower triangle and Spearman’s rank-order correlations in the upper triangle

Rao-Stirling | Diversity | Gini | Variety | Simpson | Shannon | |
---|---|---|---|---|---|---|

Rao-Stirling | 0.438 | − 0.084 | 0.470* | 0.874** | 0.893** | |

Diversity | 0.417 | 0.747** | 0.997** | 0.416 | 0.589** | |

Gini | − 0.078 | 0.765** | 0.721** | − 0.092 | 0.060 | |

Variety | 0.492* | 0.992** | 0.714** | 0.443 | 0.623** | |

Simpson | 0.896** | 0.346 | − 0.114 | 0.412 | 0.925** | |

Shannon | 0.890** | 0.600** | 0.184 | 0.684** | 0.835** |

## Conclusions and discussion

The dilemma which remained unsolved using Rao-Stirling diversity, namely of how variety and balance can be combined into “dual concept diversity” (Stirling 1998, p. 48f.), can be clarified using Nijssen et al.’s (1998) argument that the Gini coefficient is a perfect indicator of balance. Since the Gini coefficient is not an indicator of variety; this latter term can be operationalized as relative variety and thus be bounded between zero and one. The three components of diversity—variety, balance, and disparity—can thus be clearly distinguished and independently operationalized as measures varying between zero and one. The new diversity indicator ranges with more resolving power in the empirical case. However, the new diversity indicator did not correlate with Rao-Stirling diversity.

I don’t want to argue for this diversity measure beyond the status of another indicator. Unlike the confusion hitherto, however, the new indicator is based on the solution made possible by Nijssen et al.’s (1998) proof and Stirling’s (1998) analysis of the literature. The independent operationalization of the three aspects of diversity distinguished by Stirling (1998, 2007) provides a more reliable ground than “dual” or higher-order concepts. A routine is provided at http://www.leydesdorff.net/software/diverse for computing both Rao-Stirling diversity and this new indicator (see the Appendix).

The diversity issue is important for the measurement of interdisciplinarity and knowledge integration in science and technology studies. However, the further elaboration of this relevance requires yet another discussion (e.g., Wagner et al. 2011). In Leydesdorff et al. (2018), for example, we argued that a high diversity—measured as Rao-Stirling diversity—in citing patterns may indicate esoteric originality at the journal level and perhaps trans-disciplinarity more than knowledge integration. Uzzi et al. (2013), however, considered atypical combinations in citing behavior at the paper level on the contrary as an indication of novelty.

## Footnotes

- 1.
\(\mathop \sum \limits_{ij} p_{i} p_{j} = 1\) when taken over all

*i*and*j.*The Simpson index is equal to Σ_{ i }(*p*_{ i })^{ 2 }*, and the Gini-Simpson to [1 − Σ*_{ i }(*p*_{ i })^{ 2 }*].* - 2.
- 3.
If one wished, one could replace the variety measure with the Shannon function.

- 4.
A routine for the computation can be found at http://www.leydesdorff.net/software/diverse (see the Appendix).

- 5.
As can be expected, the coefficient of variation correlated significantly with the Gini coefficient: both Spearman’s rank-order correlation and the Pearson correlation are .94 (

*p*< .01;*n*= 20).

## Notes

### Acknowledgement

I thank Ronald Rousseau for comments and stimulating discussions about previous versions of this communication.

## References

- Buchan, I. (2002). Calculating the Gini coefficient of inequality. https://www.nibhi.org.uk/Training/Statistics/Gini%20coefficient.doc.
- Frenken, K., Van Oort, F., & Verburg, T. (2007). Related variety, unrelated variety and regional economic growth.
*Regional Studies,**41*(5), 685–697.CrossRefGoogle Scholar - Hill, M. O. (1973). Diversity and evenness: A unifying notation and its consequences.
*Ecology,**54*(2), 427–432.CrossRefGoogle Scholar - Izsák, J., & Papp, L. (1995). Application of the quadratic entropy indices for diversity studies of drosophilid assemblages.
*Environmental and Ecological Statistics,**2*(3), 213–224.CrossRefGoogle Scholar - Jaffe, A. B. (1989). Characterizing the “technological position” of firms, with application to quantifying technological opportunity and research spillovers.
*Research Policy,**18*(2), 87–97.CrossRefGoogle Scholar - Junge, K. (1994). Diversity of ideas about diversity measurement.
*Scandinavian Journal of Psychology,**35*(1), 16–26.CrossRefGoogle Scholar - Leydesdorff, L. (1991). The static and dynamic analysis of network data using information theory.
*Social Networks,**13*(4), 301–345.CrossRefGoogle Scholar - Leydesdorff, L., Kogler, D. F., & Yan, B. (2017). Mapping patent classifications: portfolio and statistical analysis, and the comparison of strengths and weaknesses.
*Scientometrics,**112*(3), 1573–1591.CrossRefGoogle Scholar - Leydesdorff, L., Wagner, C. S., & Bornmann, L. (2018). Betweenness and diversity in journal citation networks as measures of interdisciplinarity–A tribute to Eugene Garfield.
*Scientometrics*,*114*(2), 567–592. https://doi.org/10.1007/s11192-017-2528-2.CrossRefGoogle Scholar - MacArthur, R. H. (1965). Patterns of species diversity.
*Biological Reviews,**40*(4), 510–533.CrossRefGoogle Scholar - Nijssen, D., Rousseau, R., & Van Hecke, P. (1998). The Lorenz curve: A graphical representation of evenness.
*Coenoses,**13*(1), 33–38.Google Scholar - Rafols, I., & Meyer, M. (2010). Diversity and network coherence as indicators of interdisciplinarity: Case studies in bionanoscience.
*Scientometrics,**82*(2), 263–287.CrossRefGoogle Scholar - Rao, C. R. (1982). Diversity: Its measurement, decomposition, apportionment and analysis.
*Sankhy: The Indian Journal of Statistics, Series A,**44*(1), 1–22.MathSciNetMATHGoogle Scholar - Rousseau, R. (2018). The repeat rate: From hirschman to stirling. Under submission.Google Scholar
- Rousseau, R., Van Hecke, P., Nijssen, D., & Bogaert, J. (1999). The relationship between diversity profiles, evenness and species richness based on partial ordering.
*Environmental and Ecological Statistics,**6*(2), 211–223.CrossRefGoogle Scholar - Shannon, C. E., & Weaver, W. (1949).
*The mathematical theory of communication*. Urbana: University of Illinois Press.MATHGoogle Scholar - Stirling, A. (1998). On the economics and analysis of diversity.
*SPRU Electronic Working Paper Series*No. 28. http://www.sussex.ac.uk/Units/spru/publications/imprint/sewps/sewp28/sewp28.pdf. - Stirling, A. (2007). A general framework for analysing diversity in science, technology and society.
*Journal of the Royal Society, Interface,**4*(15), 707–719.CrossRefGoogle Scholar - Theil, H. (1972).
*Statistical decomposition analysis*. Amsterdam: North-Holland.MATHGoogle Scholar - Uzzi, B., Mukherjee, S., Stringer, M., & Jones, B. (2013). Atypical combinations and scientific impact.
*Science,**342*(6157), 468–472.CrossRefGoogle Scholar - Wagner, C. S., Roessner, J. D., Bobb, K., Klein, J. T., Boyack, K. W., Keyton, J., et al. (2011). Approaches to understanding and measuring interdisciplinary scientific research (IDR): A review of the literature.
*Journal of Informetrics,**5*(1), 14–26.CrossRefGoogle Scholar - Zhang, L., Rousseau, R., & Glänzel, W. (2016). Diversity of references as an indicator for interdisciplinarity of journals: Taking similarity between subject fields into account.
*Journal of the Association for Information Science and Technology,**67*(5), 1257–1265. https://doi.org/10.1002/asi.23487.CrossRefGoogle Scholar - Zhou, Q., Rousseau, R., Yang, L., Yue, T., & Yang, G. (2012). A general framework for describing diversity within systems and similarity between systems with applications in informetrics.
*Scientometrics,**93*(3), 787–812.CrossRefGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.