On the uncertainty of interdisciplinarity measurements due to incomplete bibliographic data
 1.2k Downloads
 5 Citations
Abstract
The accuracy of interdisciplinarity measurements is directly related to the quality of the underlying bibliographic data. Existing indicators of interdisciplinarity are not capable of reflecting the inaccuracies introduced by incorrect and incomplete records because correct and complete bibliographic data can rarely be obtained. This is the case for the Rao–Stirling index, which cannot handle references that are not categorized into disciplinary fields. We introduce a method that addresses this problem. It extends the Rao–Stirling index to acknowledge missing data by calculating its interval of uncertainty using computational optimization. The evaluation of our method indicates that the uncertainty interval is not only useful for estimating the inaccuracy of interdisciplinarity measurements, but it also delivers slightly more accurate aggregated interdisciplinarity measurements than the Rao–Stirling index.
Keywords
Interdisciplinarity Rao–Stirling index Bibliometrics Missing data Uncertainty Optimization Spanning treeIntroduction
Most quantitative measures of the output of InterDisciplinary Research (IDR) rely on bibliometric methods. Since such methods are commonly used to inform policy in science and technology, they require reliable indicators and results. While analytical indicators and tools have been refined over time, their results are in most cases not precise. The accuracy of such indicators depends on the quality of the bibliographic data, which should be correct and complete. Unfortunately, the gathering of a correct and complete bibliographic dataset is a complicated task due to the fact that not all scientific publications are indexed by digital libraries. Current bibliographic databases, such as the Web of Science (WoS) or Scopus, do not cover books, book chapters and many regional nonEnglish journals in which some fields mainly publish. Even conference proceedings, which constitute the main publication venues in many applied fastchanging fields, are often not indexed. The gathering and comparison of records gathered from different bibliographic sources mitigates this problem to some extent. However, an additional problem affects top–down approaches to measure IDR such as the Rao–Stirling diversity index: the need for a predefined taxonomy of disciplines that classifies all publications in the dataset. This problem cannot be solved with the comparison of data gathered from different sources because not all libraries classify their publications into a taxonomy of disciplines nor use the same taxonomy, and even those that use a taxonomy might not classify all their indexed publications with it—as is the case of WoS. Manual classification of publications into disciplinary fields is also not viable for a large number of uncategorized publications. In consequence, top–down measurements of IDR usually deliver proxy results.
In this paper we acknowledge the problem of dealing with incomplete data gathered from several libraries. We focus on the problem of uncategorized publications for the measurement of IDR with the Rao–Stirling index. We choose this index because it is a wellestablished bibliometric indicator that requires a complete categorization of all references into disciplinary fields; however this problem has not received adequate attention in the literature. We propose a theoretical extension of the Rao–Stirling index to account for the uncertainty resulting from references that remain uncategorized.
Background
The field of measuring IDR heavily relies on bibliometric methods and data due to the widelyheld view that scientific research is disseminated via publications. Different types of approaches exist for measuring IDR, which have been accordingly endorsed for differing needs of analysis. For an extensive review of approaches, we refer to the work of Wagner et al. (2011). Among them, the most common method for measuring IDR is citation analysis, in which an exchange or integration among fields is captured via disciplinespecific citations pointing to other fields. Two distinguishable strategies for measuring IDR are bottom–up and top–down. The first approach is based on clusters of articles without a predefined taxonomy of disciplines. The clustering is based on the structural relationships of a network of publications (Boyack and Klavans 2010; Chen et al. 2010; Leydesdorff 2007; Leydesdorff et al. 2013). In contrast, top–down approaches rely on a predefined taxonomy of disciplines that is used to classify publications into disciplinary fields (Leydesdorff et al. 2013; Porter and Rafols 2009; Rafols et al. 2012). While bottom–up approaches are suited for capturing emerging developments that do not fit into existing categories, the classificationbased approach is useful for largescale explorations, such as comparisons of areas of science using an extensive amount of data or the disciplinary breadth of research institutions. The latter approach is the focus of this paper.
The results of citation analyses are subject to the quality of bibliographic data in terms of completeness and accuracy. Wellestablished top–down methods used to analyze the number of disciplines cited by a publication or their degree of concentration such as Shannon entropy Shannon (1948) and Herfindhal index Rhoades (1993) are designed to be used with datasets with complete information, since they cannot acknowledge the degree of missing data. This is also the case of the Rao–Stirling diversity index, a more complete top–down index proposed by Porter et al. (2007), and Porter and Rafols (2009). Precise IDR measurement using these methods requires a bibliographic dataset with: (1) complete records of references, (2) a correct list of references for each publication, (3) accurate categorization of publications into disciplinary fields, and (4) the categorization of each reference into at least one discipline. The combination of such quality characteristics results in groundtruth bibliographic data, which is rarely attainable since no publication database provides adequate correctness and completeness in respect to both references and categorization into disciplinary fields.
Concerning references, verification mechanisms as discussed by van Raan (1996) are crucial to detect incomplete records of references and remove incorrect references in bibliographic sources, such as those encountered by Moed et al. (1995) and Chen et al. (2012). In regard to taxonomies of disciplines, their accuracy have been widely discussed in the literature without reaching consensus on an adequate one National Research Council (2010), Rafols and Leydesdorff (2009). In spite of its weaknesses, the list of categories provided by WoS is the most widely used (Bensman and Leydesdorff 2009; Pudovkin and Garfield 2002). The exhaustive categorization of all references within a dataset into disciplinary fields remains an open issue underdiscussed in the literature. Although the important consequences of missing data in bibliographic datasets have been acknowledged in the literature (Moed et al. 1985), to our knowledge the problem of uncategorized records in top–down IDR measurement has not been properly addressed. Some bibliometric studies minimize this problem by excluding uncategorized publications from the dataset. The use of the categories of WoS implies the exclusion of all publications other than journals indexed by WoS (i.e., proceedings papers, books, technical reports) (Bjurström and Polk 2011; Carley and Porter 2011; Chen et al. 2012). Other studies account for the percentage of uncategorized publications and compute the index on the categorized references (Rafols et al. 2012; Porter and Rafols 2009). These approaches do not take into account the potential diversity of the excluded or missing data; hence interdisciplinarity is underestimated.
A method that automatizes the assignment of disciplines was implemented by Ponomarev et al. (2013) in order to categorize authors into one out of a small set of major research fields. It is based on aggregated information on the categories of the publications of the author and their references, for which disciplines are grouped into broad categories that relate to the research activity of the group of individuals. Disciplines unrelated to the research activity of the group of individuals are categorized as ‘others’. Therefore, it does not allow for the automatic assignment of specific categories loosely related to the selected major fields, which is needed to compute the Rao–Stirling index.
In the following we propose a method which acknowledges missing data and determines the associated uncertainties (see “Method” section), as well as its evaluation and discussion in the subsequent sections.
Method
Introduction
Missing Data
Problems arise when the disciplines of one or more references are unknown. As a consequence, \(\mathbf {c}\) cannot be determined and \(I\) is not well defined. The common approach is to simply omit these references and compute the index on the references categorized with disciplines (Bjurström and Polk 2011; Carley and Porter 2011; Chen et al. 2012; Rafols et al. 2012; Porter and Rafols 2009). Depending on the counts \(\mathbf {c}\) obtained from the categorized references, as well as the number of uncategorized references, the uncertainty can widely vary. For a single uncategorized reference among dozens categorized, the effect would be minor, whereas in the converse case, the uncertainty spans nearly the whole range of the index, rendering the initial estimate meaningless.
To capture the effects of missing data, we will compute the range in which the Rao–Stirling diversity \(I\) can vary when the uncategorized references are assigned to (sensible) arbitrary disciplines. While this range could be determined by enumerating all possible assignments and computing \(I\) for each, such an approach is computationally infeasible as it suffers from combinatorial explosion, i.e., an uncategorized reference can be assigned to \({N_{\mathcal {T}}}\) disciplines in \(2^{N_{\mathcal {T}}}\) ways. Instead, we will formulate the search for an upper and lower bound on \(I\) as an optimization problem. In the following, we present its basic formulation and several subsequent refinements.
Uncertainty Estimation
Constraint refinement
Discipline pruning
A reassignment of an uncategorized reference to an arbitrary subset of disciplines can lead to highly improbable results even when the cardinality of the subset is bounded as described in “Constraint refinement” section. This arises naturally due to the maximization of the Rao–Stirling diversity index in the aforementioned optimization problems. A concrete example could be a document in the field of computer science that exclusively cites previous works from its own discipline but has two uncategorized references. A possible reassignment that would significantly increase its diversity can be realized by assigning them to the unrelated disciplines of, for example, zoology and slavic literature. While such an assignment is not invalid perse, it is nevertheless prohibitively unlikely and in this section we present a method to exclude such improbable disciplines.
A simple straightforward solution would be to just eliminate all disciplines that are not already observed from the categorized references, i.e., to set the constraint \(n_i = 0\) (resp. \(p_i = 0\)), if \(c_i = 0\). The problem with this approach is that it does not allow for the introduction of new disciplines through the reassignment of uncategorized references, which would underestimate the achievable diversity significantly.
In contrast, we take the mutual similarities of different disciplines into account for which we utilize the similarity matrix \(\mathbf {S}\) as given in Eq. 1. If the categorized references are from closely related disciplines, we only permit very similar disciplines to participate in the reassignment procedure, whereas we allow a larger set of disciplines for categorized references belonging to a diverse set of disciplines.
 Completeness

Each neighborhood should contain at least two observed disciplines. This ensures that each neighborhood includes at least all disciplines that are more similar than the next most similar known discipline.
 Cohesion

The neighborhoods should form a single connected component to avoid having multiple disjoint discipline clusters. For documents with references in, for example, two dissimilar disciplines, an omission of this objective could lead to a set of permissible disciplines that are very similar to either of these two known disciplines without considering the disciplines in between them.
 Conciseness

The neighborhoods should be chosen in such a way as to yield the smallest possible set of permissible disciplines that fulfills the previous objective. The actual meaningfulness of the upper bound of the uncertainty interval is ensured in this way.
Computational methods
In this section, we describe the computational methods used to compute the solutions of the optimization problems stated in Eqs. 2 or 3 while taking the constraints in Eqs. 46 into account. We choose different solution strategies for finding the reassignments with lowest possible diversity index \(I_\) and highest possible diversity index \(I_+\). The need for different strategies lies in the nature of the similarity measure between different disciplines, given by the similarity matrix \(\mathbf {S}\); it has to be positive semidefinite to yield a nonnegative diversity index for arbitrary discipline counts. The associated quadratic form \(\mathbf {c} \, \mathbf {S} \, \mathbf {c}^\intercal\) is thus a convex function in \(\mathbf {c}\), while \( \mathbf {c} \, \mathbf {S} \, \mathbf {c}^\intercal\) is concave. Thus, the Rao–Stirling diversity (see Eq. 1) is a concave function and its maximization (to obtain \(I_+\)) can be computed with the help of quadratic programming (Nocedal and Wright 2006). Note that the constraints in Eqs. 2–5 constitute linear functions, which can be incorporated into the computation as linear equality and inequality constraints and do not impact its polynomial runtime complexity (Kozlov et al. 1980).
The minimization of a concave function has significantly worse complexity and the computation of \(I_\) lies in the class NPhard (Pardalos and Vavasis 1991; Sahni 1974). However, we exploit the fact that the Rao–Stirling diversity is purely concave in the sense that all the eigenvalues of the similarity matrix \(\mathbf {S}\) are nonpositive. From this follows that all local minima lie on the vertices of the polytope that is bounded by the constraints of the optimization problems (Floudas and Visweswaran 1995). A search over all possible vertices yields the global minimum in exponential time, since the polytope for optimization problem Eq. 2 has \(2^{N_{\mathcal {T}}}\) vertices, where \({N_{\mathcal {T}}}\) denotes the number of disciplines with \({N_{\mathcal {T}}}= 249\) in our case. Our constraint refinement of “Constraint refinement” section reduces the search space significantly and, apart from a more realistic uncertainty estimation, ensures the efficient computability of \(I_\). Limiting the discipline reassignment to at most four disciplines (i.e., \(k = 4\)) limits the search space to only \(\sum _{i=1}^{k=4} \left( {\begin{array}{c}{N_{\mathcal {T}}}\\ i\end{array}}\right) =1.6\times 10^{8}\) vertices, which can be explored exhaustively on commodity hardware. See “Computation of the Rao–Stirling index and its uncertainty interval” section for a discussion of the choice of \(k = 4\).
The discipline pruning and the corresponding maximal spanning tree have negligible computational overhead but reduce the dimensionality of the aforementioned minimization or maximization problem even further. The computation of \(I_\) especially benefits from this approach. For the minimum spanning tree computation, Prim’s algorithm is used (Prim 1957).
Evaluation
The evaluation of the proposed method was conducted empirically. Following the framework for knowledge integration and diffusion suggested by Liu et al. (2012), the uncertainty intervals of the interdisciplinarity of the publications of a set of individuals were calculated. Groundtruth bibliographic data provided by the authors in personal interviews was used to evaluate the method. The results of our method computed with incomplete data from digital libraries were compared with the results of the Rao–Stirling index calculated with groundtruth data.
Sample frame
The sample frame of this study consists of the publications of doctoral researchers in a Computer Science (CS) faculty of a highly ranked European university between 2009 and 2014. Doctoral researchers are usually the main authors of their publications and have a thorough knowledge of the literature they reference. We focus on CS because this field emerged as a result of integrating disciplines and it continues to be one of the most interdisciplinary fields because of its diverse applications. Moreover, CS is an ideal field to use in evaluating our method because gathering publication data with a high percentage of categorized references is especially challenging. While in other fields conferences serve as venues for community building and maintenance, in CS they focus on selectivity, quality and fast dissemination—needed in such a fastevolving field—which drives down conference acceptance rates Grudin (2011). Therefore, CS researchers target their publications at conferences, which are regarded as the primary means of publication in the field. Since conference publications are not associated to the taxonomy of disciplines of WoS, which we use in this analysis, a high number of uncategorized references is obtained.
Data collection
In order to gather the most complete and accurate record of publications and their references, data was gathered from different sources. First, the publication database of the university was used to collect all the publications of doctoral students of the CS faculty published between 2009 and 2014. This database contains a very exhaustive list of publications authored by those affiliated to the university, as its records are used to compute the financial assignments to the different research groups. Because the publication database of the university does not keep records of references, in the next step we gathered more data from online bibliographic databases: (1) Scopus from Elsevier, which offers high coverage of articles; and (2) WoS from Thomson Reuters, which provides a comprehensive citation search and encompasses publications of multiple online databases, resulting in multidisciplinary coverage.
The association of publications to disciplinary fields was possible using the taxonomy of disciplines of WoS, called Category Terms (CTs). It contains 249 CTs and is elaborated based on a combination of subject matter expert judgments and interjournal citation patterns that together serve to cluster journals into topical groupings. Since there is no consensus on a perfect taxonomy of disciplines, the one of WoS was selected because its extensive use in the bibliometric analyses of previous related work, but other taxonomies could also be used. As a measure of similarity between CTs, we used the cocitation similarity matrix provided by Porter and Rafols (2009).
The combination of several databases increases the completeness of the record of references at the same time that it decreases the percentage of publications categorized with CTs—only journal publications indexed by WoS are categorized. Our dataset contains 1746 publications authored by 225 doctoral students. The extraction of references was possible for 1068 publications indexed by WoS or Scopus. The association of CTs to references was possible for 979 of the publications that had references indexed by WoS. A total of 12,243 references were extracted, of which 5310 are categorized with CTs.
Computation of the Rao–Stirling index and its uncertainty interval
We calculated the Rao–Stirling index and the uncertainty interval of the 1068 publications for which the extraction of references was possible. The limit of discipline reassignment for the uncertainty interval was set to \({k=4}\). This score is at the 99th percentile of the number of CTs used by WoS to categorize the journals of our dataset. The tolerance was also set to the 99th percentile of similarity between CTs (\({t=0.233}\)) in order to incorporate a slight diversity into the pool of similar CTs to be used in the reassignment procedure.
Collection of groundtruth data

Digital copies of the author’s publication and all its references which were gathered manually from digital libraries.

A printout of the taxonomy of CTs of WoS. In order to make the search of CTs easier for the participants, CTs were grouped into macrodisciplines.

Explain the importance of providing objective data. Since interdisciplinary research has a good connotation, it was important to make our participants understand that they were not going to be evaluated in terms of interdisciplinarity. We asked them to provide us with the most objective data without exaggerating interdisciplinarity or singledisciplinarity.

Make sure that participants became acquainted with the taxonomy of CTs, as none of the participants were familiar with it.

Confirm that participants understood their task. Participants were asked to think out loud and explain their choice of CTs for verification purposes.

Make sure that each participant followed the same criteria to categorize publications into disciplines.
Comparative analysis
Estimated mean and standard deviation (SD) of the Rao–Stirling index of the 48 publications of the sample calculated with incomplete and completed data. These estimated values were calculated with a bootstrapped sample of 50,000 elements with replacement
Rao–Stirling index  Estimated mean  SD 

Incomplete data  0.47495  0.03929 
Completed data  0.53862  0.03307 
Estimated mean, bias and standard deviation of the indices of the 48 publications of the sample: Rao–Stirling index with completed data (first row), Rao–Stirling with incomplete data (second row), the center of the uncertainty interval (third row), and the center of the uncertainty interval weighted according to its size (fourth row). These estimated values were calculated with a bootstrapped sample of 50,000 elements with replacement. A visual representation of these values can be observed in Fig. 5
Diversity index  Estimated mean  Bias  SD 

Rao–Stirling with completed data  0.539  −9.646 × 10^{−6}  3.308 × 10^{−2} 
Rao–Stirling with incomplete data  0.475  1.390 × 10^{−4}  3.929 × 10^{−2} 
Center uncertainty interval  0.569  2.869 × 10^{−5}  2.964 × 10^{−2} 
Weighted center uncertainty interval  0.558  1.342 × 10^{−2}  3.266 × 10^{−2} 
Discussion
The accuracy of citationbased IDR measurements heavily depends on the quality of the bibliographic data. The combination of data from several sources might help to enhance the quality of data but it certainly does not assure groundtruth bibliographic data. The dataset gathered for the evaluation of our methods is an example of an incomplete one, even though data from three different digital libraries was extracted and combined. Not all publications of our dataset have a complete record of references, and not all references are categorized with CTs. The Rao–Stirling index is incapable of taking both problems into account as it is not designed to handle missing data.
Our method tackles the problem of uncategorized references, extending the Rao–Stirling index to encode the uncertainty caused by missing data as an interval. A high degree of incompleteness in publications particularly interdisciplinary in nature may also result in underestimating the upper bound of the uncertainty interval. This is especially problematic when a publication only has one reference categorized by a single CTs. Such a degree of incompleteness affects the rational redistribution of CTs needed to compute the upper endpoint of the uncertainty interval (see publication ID = 6 in Figs. 3 and 4). The main benefit of the uncertainty interval is that it acts as a confidence indicator of the results delivered by the Rao–Stirling index. On the one hand, publications with a low proportion of uncategorized references have correspondingly small uncertainty intervals, implying a more reliable measurement of the Rao–Stirling index. On the other hand, publications with a high proportion of uncategorized references have correspondingly large uncertainty intervals, indicating an unreliable measurement of the Rao–Stirling index. This finding proves the importance of selecting publications with a proportion of categorized references above a threshold value when computing an index of interdisciplinarity, as in the analysis of Rafols et al. (2012).
The empirical evaluation of our method confirms that the acknowledgment of missing data delivers a more accurate aggregated IDR measurement than the Rao–Stirling index. Our contribution constitutes a first approach to measure IDR taking into account the inaccuracy of the bibliographic data, but other problems still affect the results of the Rao–Stirling and other IDR indices. Future analysis to evaluate this method should be conducted using other taxonomies of disciplines. Further work would be needed in order to tackle the problem of incomplete and incorrect records of references, as well as incorrect categorization of publications into disciplinary fields. Additional issues to consider are the use of a precise taxonomy of disciplines and similarity matrix. Therefore, further avenues of research towards more precise IDR indicators remain open. To aid these efforts, we are providing the source code for our implementation of the uncertainty computation to the community, which can be found at https://gitlab.com/mc.calatrava.moreno/robustrao.git.
Notes
Acknowledgments
The authors wish to thank the 48 doctoral researchers who agreed to participate in this study and generously shared their time to be interviewed.
Supplementary material
References
 Bensman, S. J., & Leydesdorff, L. (2009). Definition and identification of journals as bibliographic and subject entities: Librarianship versus ISI Journal Citation Reports methods and their effect on citation measures. Journal of the American Society for Information Science and Technology, 60(6), 1097–1117. doi: 10.1002/asi.21020.CrossRefGoogle Scholar
 Bjurström, A., & Polk, M. (2011). Climate change and interdisciplinarity: A cocitation analysis of IPCC third assessment report. Scientometrics, 87(3), 525–550. doi: 10.1007/s1119201103563.CrossRefGoogle Scholar
 Boyack, K. W., & Klavans, R. (2010). Cocitation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? Journal of the American Society for Information Science and Technology, 61(12), 2389–2404. doi: 10.1002/asi.21419.CrossRefGoogle Scholar
 Carley, S., & Porter, A. L. (2011). A forward diversity index. Scientometrics, 90(2), 407–427. doi: 10.1007/s1119201105281.CrossRefGoogle Scholar
 Chen, C., Hu, Z., Liu, S., & Tseng, H. (2012). Emerging trends in regenerative medicine: A scientometric analysis in CiteSpace. Expert Opinion on Biological Therapy, 12(5), 593–608. doi: 10.1517/14712598.2012.674507.CrossRefGoogle Scholar
 Chen, C., IbekweSanJuan, F., & Hou, J. (2010). The structure and dynamics of cocitation clusters: A multipleperspective cocitation analysis. Journal of the American Society for Information Science and Technology, 61(7), 1386–1409. doi: 10.1002/asi.21309.CrossRefGoogle Scholar
 Floudas, C. A., & Visweswaran, V. (1995). Quadratic optimization. In R. Horst & P. Pardalos (Eds.), Handbook of global optimization (Vol. 2, pp. 217–269). New York: Springer.CrossRefGoogle Scholar
 Grudin, J. (2011). Technology, conferences, and community. Communications of the ACM, 54(2), 41–43. doi: 10.1145/1897816.1897834.CrossRefGoogle Scholar
 Kozlov, M., Tarasov, S., & Khachiyan, L. (1980). The polynomial solvability of convex quadratic programming. USSR Computational Mathematics and Mathematical Physics, 20(5), 223–228. doi: 10.1016/00415553(80)900981.MathSciNetCrossRefMATHGoogle Scholar
 Leydesdorff, L. (2007). Betweenness centrality as an indicator of the interdisciplinarity of scientific journals. Journal of the American Society for Information Science and Technology, 58(9), 1303–1319. doi: 10.1002/asi.20614.CrossRefGoogle Scholar
 Leydesdorff, L., Carley, S., & Rafols, I. (2013). Global maps of science based on the new WebofScience categories. Scientometrics, 94(2), 589–593. doi: 10.1007/s1119201207848.CrossRefGoogle Scholar
 Leydesdorff, L., Rafols, I., & Chen, C. (2013). Interactive overlays of journals and the measurement of interdisciplinarity on the basis of aggregated journaljournal citations. Journal of the American Society for Information Science and Technology, 64(12), 2573–2586. doi: 10.1002/asi.22946.CrossRefGoogle Scholar
 Liu, Y., Rafols, I., & Rousseau, R. (2012). A framework for knowledge integration and diffusion. Journal of Documentation, 68(1), 31–44. doi: 10.1108/00220411211200310.CrossRefGoogle Scholar
 Moed, H., Burger, W., Frankfort, J., & Van Raan, A. F. (1985). The application of bibliometric indicators: Important fieldand timedependent factors to be considered. Scientometrics, 8(3–4), 177–203. doi: 10.1007/BF02016935.CrossRefGoogle Scholar
 Moed, H., De Bruin, R., & Van Leeuwen, T. (1995). New bibliometric tools for the assessment of national research performance: Database description, overview of indicators and first applications. Scientometrics, 33(3), 381–422. doi: 10.1007/BF02017338.CrossRefGoogle Scholar
 National Research Council. (2010). Data on federal research and development investments: A pathway to modernization. Washington, DC: The National Academies Press.Google Scholar
 Nocedal, J., & Wright, S. J. (2006). Numerical optimization (2nd ed.). New York:Springer. doi: 10.1007/9780387400655
 Pardalos, P. M., & Vavasis, S. A. (1991). Quadratic programming with one negative eigenvalue is NPhard. Journal of Global Optimization, 1(1), 15–22. doi: 10.1007/BF00120662.MathSciNetCrossRefMATHGoogle Scholar
 Ponomarev, I., Sulima, P., Basner, J., Jensen, U., Schnell, J., Jo, K., etal. (2013). A new approach for automated author discipline categorization and evaluation of crossdisciplinary collaborations for grant programs. In Proceedings 14th international society of scientometrics and informetrics conference (Vol. 2).Google Scholar
 Porter, A. L., Cohen, A. S., Roessner, J. D., & Perreault, M. (2007). Measuring researcher interdisciplinarity. Scientometrics, 72(1), 117–147. doi: 10.1007/s1119200717005.CrossRefGoogle Scholar
 Porter, A. L., & Rafols, I. (2009). Is science becoming more interdisciplinary? Measuring and mapping six research fields over time. Scientometrics, 81(3), 719–745. doi: 10.1007/s1119200821972.CrossRefGoogle Scholar
 Prim, R. (1957). Shortest connection networks and some generalizations. The Bell System Technical Journal, 36(6), 1389–1401. doi: 10.1002/j.15387305.1957.tb01515.x.CrossRefGoogle Scholar
 Pudovkin, A. I., & Garfield, E. (2002). Algorithmic procedure for finding semantically related journals. Journal of the American Society for Information Science and Technology, 53(13), 1113–1119. doi: 10.1002/asi.10153.CrossRefGoogle Scholar
 Rafols, I., & Leydesdorff, L. (2009). Contentbased and algorithmic classifications of journals: Perspectives on the dynamics of scientific communication and indexer effects. Journal of the American Society for Information Science and Technology, 60(9), 1823–1835. doi: 10.1002/asi.21086.CrossRefGoogle Scholar
 Rafols, I., Leydesdorff, L., OHare, A., Nightingale, P., & Stirling, A. (2012). How journal rankings can suppress interdisciplinary research: A comparison between innovation studies and business & management. Research Policy, 41(7), 1262–1282. doi: 10.1016/j.respol.2012.03.015.CrossRefGoogle Scholar
 Rhoades, S. A. (1993). The Herfindahl–Hirschman index. Federal Reserve Bulletin, 79(Mar), 188–189.Google Scholar
 Sahni, S. (1974). Computationally related problems. SIAM Journal on Computing, 3(4), 262–279. doi: 10.1137/0203021.MathSciNetCrossRefMATHGoogle Scholar
 Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379–423. doi: 10.1002/j.15387305.1948.tb01338.x.MathSciNetCrossRefMATHGoogle Scholar
 Stirling, A. (2007). A general framework for analysing diversity in science, technology and society. Journal of the Royal Society Interface, 4(15), 707–719. doi: 10.1098/rsif.2007.0213.CrossRefGoogle Scholar
 van Raan, A. (1996). Advanced bibliometric methods as quantitative core of peer review based evaluation and foresight exercises. Scientometrics, 36(3), 397–420. doi: 10.1007/BF02129602.CrossRefGoogle Scholar
 Wagner, C. S., Roessner, J. D., Bobb, K., Klein, J. T., Boyack, K. W., Keyton, J., et al. (2011). Approaches to understanding and measuring interdisciplinary scientific research (IDR): A review of the literature. Journal of Informetrics, 5(1), 14–26. doi: 10.1016/j.joi.2010.06.004.CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.