Introduction

When setting policies for science and technology innovation and planning research strategies, it is important to analyze the current research capabilities of researchers and research institutes and to predict future trends in related research fields. In discussions on science and technology innovation in Japan, interdisciplinarity is considered to be particularly important for research conducted at universities. To investigate interdisciplinary research for their policy and strategy planning, Japanese government agencies use analyses based on indicators of interdisciplinary research, such as cited references, from the National Institute of Science and Technology Policy (NISTEP) [Cabinet Office of Japan, 2016, 2019, 2020; Ministry of Education, Culture, Sports, Science and Technology of Japan (MEXT), 2018].

In a study measuring interdisciplinarity, Porter et al. (2007) proposed two indices: integration and specialization. Integration describes the degree to which a research article cites articles from other subject categories in Web of Knowledge (the precursor to Web of Science), whereas specialization measures the spread of subject categories in which a body of research (such as the work of a given author in a set time period) is published. Porter & Rafols (2009) then investigated changes in the degree of interdisciplinarity in six research areas from 1975 to 2005 and found that although scientific research is becoming more interdisciplinary, progress is small and citations are largely from neighboring fields. Leydesdorff & Rafols (2011) studied three indicators of interdisciplinarity: the Shannon entropy (Shannon, 1948), the betweenness centrality (Freeman, 1977, 1978/1979), and the Rao-Stirling measures (Rao, 1982; Stirling, 2007). Although no single indicator clearly measured the interdisciplinarity of research, they each captured a different aspect of interdisciplinary field integration. Other studies used Hill-type true diversity to measure the diversity of cited references (Jost, 2006, 2007, 2009; Leinster & Cobbold, 2012), and it has been suggested that this indicator may be a useful measure of the degree of interdisciplinarity (Zhang et al., 2016). A more detailed discussion of these indices can be found in Wagner et al. (2011), and a review of the broader context of scientometrics is available in Mingers & Leydesdorff (2015). Kim et al. (2022) have presented a methodological framework for analyzing topic-based interdisciplinarity. The framework is not discipline-specific, but serves as a guide for identifying the characteristics of and relationships among topics that are actively discussed in highly interdisciplinary research areas.

To determine the impact of interdisciplinary research, Lariviere & Gingras (2010) conducted a citation rate study, defining the degree of field integration for individual academic papers as the percentage of citations to journals in other disciplines. From that study, the authors noted: (i) across all disciplines, there is no clear correlation between the degree of interdisciplinarity and citation rates; (ii) the more specialized a discipline is, the higher the citation rate of specific fields; and (iii) papers with higher degrees of specialization and interdisciplinarity have smaller scientific (citation) impact. From an impact perspective, there is an optimal degree of field integration. In a study investigating citations at the journal level, Silva et al. (2013) tried to quantify field integration by conducting entropy measurements of the diversity of the fields of journals that cite a particular journal. The result indicated that scientific fields are becoming increasingly interdisciplinary, and the degree of interdisciplinarity (entropy) is strongly correlated with the impact factor (IF) of journals (high entropies were obtained for journals with very wide readership). Kong & Wand compared the differences in citation counts and Altmetric scores between covered and non-covered articles published in Nature (Kong & Wang, 2020). They showed in their empirical study that the number of citations is significantly higher than that of noncited papers, but the difference has become smaller in recent years. They also noted that in the biological sciences, physical sciences, and other interdisciplinary areas, papers with high citation counts and high Altmetric scores were relatively more interdisciplinary. In a recent study, Chen et al. (2021) have indicated that variety rather than balance and disparity is likely the most important interdisciplinary factor for citation impact.

Other studies limit their assessment of interdisciplinarity to specific disciplines. One notable example is a study of the NASA Astrobiology Institute at the University of Hawaii that was based on bibliometric methodology and machine learning algorithms (Gowanlock & Gazan, 2013). Another notable example is a study predicting the degree of interdisciplinarity in nanotechnology (Jang et al., 2018). The authors proposed a framework using the Glänzel-Schubert-Schoepflin model to estimate the citation rate using stochastic processes. As a study of approaches to increasing interdisciplinarity, Levy et al. (2005) have noted elements that they believe will help to further enhance interdisciplinarity in life course research. In the field of bio-nanoscience, a case study assessing the Leinster-Cobbold diversity indices noted that the various measurements of field integration are special cases of these indices (Mugabushaka et al., 2016). In the field of major medical subspecialties, one interdisciplinarity score was positively correlated with the journal's impact factor, indicating that interdisciplinary research has greater impact (Petterson et al., 2021). In terms of social science research, Brink et al. (2020) focused on interdisciplinary measures of sustainability, reviewing them and investigating whether measurement issues differ among environmental, economic, and social. In interdisciplinary research across the human and natural sciences, Pittman et al. (2016) have presented the experience of 20 years of motivating scientists to engage in interdisciplinary research, providing an environment for learning across disciplines, and structuring research programs to advance knowledge for decision making on global change.

Other studies on the degree of field integration in academic research have discussed assessment indices, the impact of field integration, and case studies on specific research institutions/disciplines. However, there have been no specific and quantitative discussions on the state of realization of field integration in a particular field, including a discussion of the other fields with which it is, or is not, integrated. In the above-mentioned “adjacent” areas by Porter et al. (2007), the degree of integration between a particular field and other fields has yet to be shown quantitatively. The disparity, which is one of the Rao-Stirling measures, is defined as the average distance between fields. This index is calculated for a set of publications using cited references.

In this study, we propose a concept of the “affinity” between fields in academic research. We define affinity as the number of journals assigned to two fields divided by the number of journals assigned to one of those two fields. To calculate the affinity between fields, we use information on the research fields of all academic journals registered in Scopus. A difference between affinity and disparity is the analysis target. The analysis target of affinity is a set of journals, while disparity utilizes individual publications. Thus, citation relations affect the size of disparity, but not affinity. We also show that there is two-way information on affinity between fields, that is, percentages of field of interest assigned in other fields and percentages of other fields assigned in the field of interest. This kind of directedness is an aspect that is not included in the concept of disparity but identified as a critical factor in the context of interdisciplinarity.

As a University Research Administrator (URA), we are often exposed to top-down research evaluation indicators in the form of policies and university research strategies, etc. We often feel that the quantitative aspects of research interdisciplinarity are unclear in discussions. One of the motivations for this research is to try to improve this point from the URA's point of view.

This paper is organized as follows: In Section “Affinity between fields”, we define the affinity between fields. In Section “Data and the method of analysis”, we explain the data and the method of analysis and then visualize the affinity. In Section “Discussion”, we present detailed results and a discussion. In Section “An example of applications”, we give an example of application. In Section “Limitations”, we outline the limitations of our analyses and results. Section “Summary”, we summarize our results.

Affinity between fields

In measuring the interdisciplinarity of research, many studies based on citation analyses of articles show that some fields of research are more likely to be interdisciplinary than others, without taking into account the academic affinity between fields (Jost, 2006, 2007, 2009; Leinster & Cobbold, 2012; Leydesdorff & Rafols, 2011; Porter & Rafols, 2009; Porter et al., 2007; Zhang et al., 2016). The question arises: Can we treat all intrinsic affinities between fields equally? For example, we may infer a strong relationship between the fields of chemistry and chemical engineering from the keyword ‘chemistry’, but it is more difficult to find such relationships for the field of arts and humanities. This means that before discussing the degree of interdisciplinarity, it is necessary to measure the affinity between fields. Therefore, in this study, we develop a new approach for quantifying the affinity between fields.

Methods of network analysis are often used in analyses of the interdisciplinarity of research (e.g. for determining relationships between keywords in academic research). To construct bibliometric networks, four main bibliometric techniques are used: co-citation, bibliographic coupling, co-author, and co-word (Cobo et al., 2011). By contrast, our approach for quantifying affinity between fields introduces a new technique: co-assigned fields. The unit of analysis is journals, and relationships are characterized by co-assigned field. These points are different from disparity of the Rao-Stirling measures. Disparity looks at individual papers, and relationships are characterized by citations (referenced literature). This choice of technique, unit of analysis, and relationship characterization underpin our method to quantify the affinity between fields. We also visualize our results while keeping the quantitative aspect, the ease of reproducibility, and enhancement of visibility. We return to this point later.

We define the affinity between fields using the information about fields that is assigned to academic journals. Our definition is applicable in cases where academic journals are assigned to one or more fields. We define the affinity of field i for field j (i ≠ j), \({A}_{ij}\), as

$$\begin{array}{c}{A}_{ij}\equiv \frac{\mathrm{Number\, of \,journals\, assigned\, to\,} "i" \mathrm{\,and} "j"}{\mathrm{Number\, of\, journals\, assigned\, to\,} "i"}\times 100\end{array}$$
(1)

Note that \({A}_{ij}\) is not equal to \({A}_{ji}\); \({A}_{ij}\) represents the affinity of field i for field j, while \({A}_{ji}\) represents the affinity of field j for field i. The factor of 100 is multiplied for convenience. The calculation and properties of the affinity are shown in the next section. This affinity is not a direct measure of the interdisciplinarity of research, but rather serves as a weighting of the potential ‘distance’ between fields to the interdisciplinarity measures. We present this argument in Section “An example of applications”, using similarity as an example.

Data and the method of analysis

To derive the affinity between fields, we use the field information for all academic journals in the Scopus database from Elsevier. This abstract and citation database of peer-reviewed literature covers all academic fields and contains journals, conference proceedings and books. All academic journals registered in Scopus have one or more fields assigned from the All Science Journal Classification (ASJC). The top level of this classification system has 27 fields (Table 1); the bottom level has 334 subcategories. Furthermore, these fields belong to four larger subject areas (i.e., Life Sciences, Social Sciences, Physical Sciences, and Health Sciences), except the field General, which includes journals covering the field of science in general. We use Scopus data as the number of journals and 27 fields are sufficient to illustrate our new concept of the affinity between fields. Moreover, Scopus data can be downloaded by anyone, thus enabling reproducibility of our results.

Table 1 ASJC codes and subject areas for the 27 fields assigned to academic journals in Scopus

An analysis using more detailed classifications could be valuable, but expansion beyond the 27 fields used here would require care, as some of the ASJC subcategories have confusing labels (e.g. similar fields like "Linguistics and Language" and "Language and Linguistics"), as pointed out in Wang & Waltman, (2016). Such expansion is beyond the scope of this study, but we hope to conduct an expanded study in future. We also note that our method for quantifying the affinity between fields can use classification systems other than ASJC to define fields.

Dataset

The quantification of affinity between fields is based on a dataset of 39,743 journals registered in Scopus as of September 2019. The journals were published from 1924 to September 2019. The dataset was retrieved from Scopus by downloading the Source title list (XLSX format; https://www.elsevier.com/solutions/scopus/how-scopus-works/content).

Each academic journal has at least one assigned field. To measure the affinity of a certain field for other fields, we aggregated the data using a symmetric matrix based on the 27 ASJC fields. An advantage of using a matrix instead of a network visualization is that the matrix simply represents the frequency of links between fields. The matrix is constructed as follows:

  • If a given journal has only one assigned field, a default value of 1 is added to the relevant diagonal element (e.g., If only the field “General” is assigned, 1 is added to the diagonal element in both the “General” row and column).

  • If a given journal has more than one assigned field, the default value of 1 is added to all diagonal elements corresponding to those fields.

  • If a given journal has two assigned fields, 1 is added to the four elements of the matrix representing the combinations of the two fields (e.g., If “General” and “Chemistry” are assigned, 1 is added to the element in the “General” row and “General” column (the diagonal element), in the “Chemistry” row and “Chemistry” column (the diagonal element), in the “General” row and “Chemistry” column, and in the “Chemistry” row and “General” column).

  • If a given journal has three assigned fields, 1 is added to the elements representing all nine pair-wise combinations of the three fields (e.g., If “General,” “Chemistry,” and “Energy” are assigned, 1 is added to the element in the “General” row and “General” column, in the “Chemistry” row and “Chemistry” column, in the “Energy” row and “Energy” column, in the “General” row and the “Chemistry” column, in the “Chemistry” row and the “General” column, in the “General” row and the “Energy” column, in the “Energy” row and the “General” column, in the “Chemistry” row and the “Energy” column, and in the “Energy” row and the “Chemistry” column).

  • If a given journal has four or more assigned fields, 1 is added to the corresponding diagonal and off-diagonal elements in the same way as is used for journals with three assigned fields.

We performed these operations for all 39,743 journals to complete one matrix (see Fig. 7 in the Appendix). Using this method, we obtained the total number of journals by field, which enables us to see the distribution of other fields with respect to a certain discipline.

We then normalized the values to take into account the number of journals in each discipline (Fig. 8 in the Appendix). The diagonal components of the 27 × 27 symmetric matrix obtained in Fig. 7 normalized to 100. Each matrix element in Fig. 8, \({A}_{ij}^{(2)}\), is based on the values in Figure 7, \({A}_{ij}^{(1)}\), as

$${A}_{ij}^{(2)}=\frac{{A}_{ij}^{(1)}}{{A}_{ii}^{(1)}}\times 100$$

where \(i\) and \(j\) represent elements of the matrix rows and columns, in numerical order of their ASJC classification code. In this study, we call \({A}_{ij}^{(2)}\) the affinity between fields. The data for obtaining this affinity is updated on the Scopus approximately once every 3 months. Therefore, if a precise analysis using affinity is to be continued, it is necessary to update the data approximately once every 3 months.

It should be noted that this affinity between fields is normalized so that the diagonal elements are 100; unlike Figs. 7, 8 is not a symmetric matrix, and the meaning of the off-diagonal elements is different: \({A}_{ij}^{(1)}={A}_{ji}^{(1)}\), but \({A}_{ij}^{(2)}\ne {A}_{ji}^{(2)}\) because \({A}_{ii}^{(1)}\ne {A}_{jj}^{(1)}\). \({A}_{ij}^{(2)}\) is a quantity that indicates the percentage of the field j among all journals that have been assigned to the field i when i is the field of interest, and \({A}_{ji}^{(2)}\) is a quantity that indicates the percentage of the field i among all journals that have been assigned to the field j. For example, where column ‘MEDI’ (i = Medicine) intersects with row ‘NEUR’ (j = Neuroscience) in Fig. 8\({A}_{ji}^{(2)}=\) 58.2550. This means that about 58.3% of the academic journals assigned to Neuroscience were also assigned to Medicine. By contrast, the row of a field of interest, i, contains values representing the affinity of i to each other field, j, indicated in the relevant columns. For example, the value at the intersection of the ‘MEDI’ row and the ‘NEUR’ column in Fig. 8 is \({A}_{ij}^{(2)}=\) 3.2110. This means that about 3.2% of the academic journals assigned to Medicine were also assigned to Neuroscience. It is also important to note that the affinity cannot determine which field individual articles within these journals belong to.

In terms of the bibliometric network, each field corresponds to a node and each off-diagonal element of the matrix is a link. When one tries to show weights on the links, the magnitudes of off-diagonal elements can be used. We also found that for a given field of interest, two types of affinity between fields can be identified: the affinity of other fields to the field of interest (shown in columns in Fig. 8), and the affinity of the field of interest to other fields (shown in rows). In general, the values of these two types of affinity differ even for pairs of fields. Therefore, in discussing affinity between fields, it is necessary to clarify the field of interest, i, and the type of affinity (\({A}_{ij}^{(2)}\) or \({A}_{ji}^{(2)}\)). Thus, it is possible to understand the bidirectional affinity between fields as the flows from other fields to the field of interest as well as from the field of interest to other fields (Fig. 1). The degree of affinity is also an important factor.

Fig. 1
figure 1

The affinity between fields is bidirectional: the percentage of the field of interest assigned in other fields is not necessarily the same as the percentage of journals of other fields assigned in the field of interest. The center line corresponds to the field of interest

These degrees of affinity can be illustrated using a Sankey diagram, as shown in Fig. 2 (all Sankey diagrams for each field can be downloaded from https://data.mendeley.com/datasets/gx8g4mfk7x/draft?a=aeb9aad2-0b12-4c2b-8e0a-1aea5b90f522). Map-type networks or cyclized maps (e.g. as shown in Boyac et al., 2005; Boyac & Klavans, 2014; Klavans & Boyac, 2006, 2009, 2011; Börner & Scharnhorst, 2009; Börner et al., 2012) may also be used. Although map-type networks and cyclized maps can provide an overall picture of relationships, they provide less quantitative information than our Sankey diagrams because we use one diagram for each field of interest.

Fig. 2
figure 2

A Sankey diagram for Materials Science as the field of interest. The field of interest is represented as the center column. To the left are other fields with affinity to Materials Science; to the right are fields to which Materials Science has affinity

Discussion

For each of the 27 fields, we determined the top three fields with the highest affinity to the field of interest as well as the top three fields to which the field of interest has the highest affinity (Table 3 in the Appendix).

Of the 5 fields in the subject area of Life Sciences, “Immunology and Microbiology” has the highest affinity (28.8) to “Biochemistry, Genetics and Molecular Biology”, indicating that 28.8% of journals in the field of “Immunology and Microbiology” are also in the field of “Biochemistry, Genetics and Molecular Biology”. (Note that other affinity values ​​shown below should also be interpreted as the percentage of journals assigned to a field.) “Biochemistry, Genetics and Molecular Biology” also has the highest affinity (8.40) to “Immunology and Microbiology”. Four of the five fields in the Life Sciences have the highest affinity to “Medicine” (“Biochemistry, Genetics and Molecular Biology” (43.1), “Immunology and Microbiology” (57.5), “Neuroscience” (58.3), and “Pharmacology, Toxicology and Pharmaceutics” (43.8), which is categorized in Health Sciences.

Of the 6 fields in Social Sciences, “Arts and Humanities” has the highest affinity (58.1) to “Social Sciences”, and “Social Sciences” has the highest affinity (35.5) to “Arts and Humanities”. These values are also the highest from the standpoint of each field. Three fields have the highest affinity to “Social Sciences”: “Arts and Humanities” (58.1), “Business, Management and Accounting” (25.0), and “Psychology” (37.1).

Of the 10 fields in Physical Sciences, “Chemistry” and “Chemical Engineering” have the highest affinity to each other (27.2 and 32.3, respectively), and “Environmental Sciences” and “Earth and Planetary Sciences” have the highest affinity to each other (33.6 and 37.6, respectively). About the "Chemical Engineering", "Environmental Sciences", and “Earth and Planetary Sciences”, the affinities for "Chemistry" (32.3), "Earth and Planetary Sciences" (33.6), and "Environmental Sciences" (37.6) are also the highest ones from the standpoint of each field, respectively.

Of the 5 fields in Health Sciences, “Nursing”, “Dentistry”, and “Health Professions” have the highest affinity to “Medicine” (62.5, 32.1, and 67.2, respectively). By contrast, “Medicine” as the field of interest has low affinity for fields in Health Sciences.

In some cases, the field with the highest affinity to the field of interest is also the field to which the field of interest has the highest affinity. This is the case for nine fields of interest (presented as field of interest: field with highest affinity):

  • “Arts and Humanities”: “Social Sciences”

  • “Chemical Engineering”: “Chemistry”

  • “Earth and Planetary Sciences”: “Environmental Sciences”

  • “Economics, Econometrics and Finance”: “Business, Management and Accounting”

  • “Engineering”: “Materials Science”

  • “Environmental Sciences”: “Earth and Planetary Sciences”

  • “Social Sciences”: “Arts and Humanities”

  • “Veterinary”: “Agricultural and Biological Sciences”

  • “Dentistry”: “Medicine”

Affinity to “Medicine” is particularly high, as this field appears in the top position for seven fields of interest: “Biochemistry, Genetics and Molecular Biology”, “Immunology and Microbiology”, “Neuroscience”, “Pharmacology, Toxicology and Pharmaceutics”, “Nursing”, “Dentistry”, and “Health Professions”.

Based on our values for the affinity between fields, we calculated the mean and median values for other fields’ affinity to the field of interest (matrix elements in columns in Fig. 8) and for the affinity of the field of interest for other fields (matrix elements in rows), which are shown in Fig. 3 and Fig. 4, respectively. Each figure includes box plots of affinity for each of the 27 fields of interest; circles indicate the other fields’ affinity to the field of interest (Fig. 3 and Table C) or the affinity of the field of interest to other fields (Fig. 4 and Table D), while crosses represent the mean value.

Fig. 3
figure 3

Mean and median affinity of other fields to the field of interest (using values from columns in Fig. 2)

Fig. 4
figure 4

Mean and median affinity of the field of interest to other fields (using values from rows in Fig. 2)

Within each subject area in Fig. 3, the mean and median affinity are highest for the fields of “Biochemistry, Genetics and Molecular Biology”, “Social Sciences”, “Engineering”, and “Medicine”. These fields, which have the largest number of academic journals in each subject area, are positioned as core fields and can be clearly differentiated from other fields in Fig. 3. In Fig. 4, there is no marked difference between these four core fields and the remaining fields, which means that there is no marked difference in the intermediate layers of the affinity of each field of interest to other fields (the positions of the third and first quartiles, and the mean and median values). Meanwhile, high affinity to core fields is often seen as outliers for each field of interest, such as the affinity of “Arts and Humanities” to “Social Sciences” (58.1), of “Materials Science” to “Engineering” (47.9), and of “Nursing” and “Health Professions” to “Medicine” (62.5 and 67.2, respectively). The affinities of two fields of interest to “Medicine”, “Immunology and Microbiology” (57.5) and “Neuroscience” (58.3), also represent outliers from another subject area (Life Sciences).

These points can be demonstrated quantitatively by measuring correlations between the numbers of journals and affinity in each field. Figures 5 and 6 show the correlations between the numbers of journals of each field (using values in Fig. 7) and values of affinity of other fields to the field of interest (Fig. 5: using values in Fig. 3) and of the field of interest to other fields (Fig. 6: using values in Fig. 4), respectively. The upper and lower plots represent the correlations between the numbers of journals and mean and median values of affinity, respectively. Each point in the figures corresponds to each field. Figure 5 shows that there is a correlation between the number of journals in each field and both the mean and median of affinity. Their correlation coefficients are 0.933 (\(p\cong 1.35\times {10}^{-12}\)) and 0.882 (\(p\cong 1.21\times {10}^{-9}\)), respectively, indicating a strong positive correlation. On the other hand, Fig. 6 shows that there is no strong correlation between them. In fact, correlation coefficients between the number of journals and the mean and median of affinity in each field are -0.385 (\(p\cong 0.0472\)) and 0.00177 (\(p\cong 0.993\)), respectively, indicating that the number of journals and the mean of affinity have a weak negative correlation.

Fig. 5
figure 5

Correlations between the numbers of journals of each field (using values in Fig. 7) and mean (upper figure) and median (lower one) values of affinity of other fields to the field of interest (using values in Fig. 3)

Fig. 6
figure 6

Correlations between the numbers of journals of each field (using values in Fig. 7) and mean (upper figure) and median (lower one) values of affinity of the field of interest to other fields (using values in Fig. 4)

An example of application

Since the Salton-cosine similarity (Salton & McGill, 1983) is one of the most fundamental measures of interdisciplinarity, we examine the extent to which the affinity obtained in Section “Data and the method of analysis” affects the similarity. To demonstrate this clearly, it is appropriate to consider a simple sample of data. Therefore, we use the following table (matrix) for the three fields of Medicine (MEDI), Neuroscience (NEUR), and Nursing (NURS) as an example. We randomly generate the following number of papers for the three fields to see the impact of the affinity as follows:

Fields of target papers/Fields of cited papers from target papers

MEDI

NEUR

NURS

MEDI

1268

4145

5476

NEUR

2058

2848

6195

NURS

6842

2993

4884

Here, the labels (fields) in the rows of the matrix above indicate the assigned fields of a certain set of target papers, and those papers will usually cite several papers. The labels (fields) in the columns of the matrix indicate the fields of the papers cited by those papers. The values in the table (matrix) indicate the number of papers, and we randomly generate those values by utilizing "randbetween(0,10,000)" in the Microsoft Office Excel. Using the table, non-trivial (≠ 1) similarities (\({S}_{ij}\): i and j stand for fields, in this case, MEDI, NEUR, and NURS) among the three target fields are calculated as follows,

$$\begin{array}{c}{S}_{\mathrm{MEDI},\mathrm{ NEUR}}\fallingdotseq 0.972, {S}_{\mathrm{MEDI},\mathrm{ NURS}}\fallingdotseq 0767, {S}_{\mathrm{NEUR},\mathrm{ NURS}}\fallingdotseq 0.795,\end{array}$$
(2)

where the similarity is defined as

$$\begin{array}{c}{S}_{ij}\equiv \frac{\sum_{f}^{p}{x}_{if}{x}_{jf}}{\sqrt{\sum_{f}^{p}{x}_{if}^{2}\sum_{f}^{p}{x}_{jf}^{2}}}.\end{array}$$
(3)

The similarity is invariant with respect to the interchange of i and j (symmetric with respect to i and j: \({S}_{ij}={S}_{ji}\)). Also, i, j, and f denote the fields, and in the present case, the labels are assigned to the 27 ASJC fields shown in Table 2. In addition, p is all the fields under consideration, and if all ASJC fields are considered, the sum is taken for the 27 fields, or in the current example, MEDI, NEUR, and NURS. In this example, we find

$$\begin{array}{*{20}c} {S_{{\text{MEDI, NEUR}}} > S_{{\text{NEUR, NURS}}} > S_{{\text{MEDI, NURS}}} .} \\ \end{array}$$
(4)
Table 2 Summary of three examples for the similarities

Here, we consider how to consider the influence of the affinity \({A}_{ij}^{(2)}\) obtained in Section “Data and the method of analysis” on this similarity. We propose two intuitively understandable and simple methods as examples. They are (i) multiplying the similarity by the affinity and (ii) adding the similarity and the affinity. In (i), the affinity plays a role as a weighting factor for the similarity, and in (ii), the similarity and the affinity can be understood as equivalent information to be added together (However, as will be shown later, the affinity is to be treated as a frequency distributed between 0 and 1 as in the similarity, so the affinity divided by 100 is to be added. This means simply removing the 100-fold factor introduced for convenience in Eq. (1).)

First, we consider (i) multiplying the similarity by the affinity and define as a new similarity \({S}_{ij}^{\mathrm{M}}\) as

$$\begin{array}{c}{S}_{ij}^{\mathrm{M}}\equiv {A}_{ij}^{\left(2\right)}{S}_{ij.}\end{array}$$
(5)

Note that while the similarity given in Eq. (3) is invariant with respect to the interchange of i and j, the affinity calculated in Section “Data and the method of analysis” has bidirectional information (asymmetric with respect to i and j: \({A}_{ij}^{(2)}\)\({A}_{ji}^{(2)}\)), and thus, the new similarity defined in Eq. (5) has also bidirectional information, \({S}_{ij}^{\mathrm{M}}\ne {S}_{ji}^{\mathrm{M}}\). Following this definition and calculating a new nontrivial (≠ 1) similarity \({S}_{ij}^{\mathrm{M}}\) for the three fields MEDI, NEUR, and NURS as above, we obtain

$${S}_{\mathrm{MEDI},\mathrm{ NEUR}}^{\mathrm{M}}\fallingdotseq 3.12, {S}_{\mathrm{NEUR},\mathrm{ MEDI}}^{\mathrm{M}}\fallingdotseq 56.6,$$
$$\begin{array}{c}{S}_{\mathrm{MEDI},\mathrm{ NURS}}^{\mathrm{M}}\fallingdotseq 2.94, {S}_{\mathrm{NURS},\mathrm{ MEDI}}^{\mathrm{M}}\fallingdotseq 48.0,\end{array}$$
(6)
$${S}_{\mathrm{NEUR},\mathrm{ NURS}}^{\mathrm{M}}\fallingdotseq 1.39, {S}_{\mathrm{NURS},\mathrm{ NEUR}}^{\mathrm{M}}\fallingdotseq 1.24,$$

and thus, we fined

$$\begin{array}{*{20}c} {S_{{{\text{NEUR}},{\text{ MEDI}}}}^{{\text{M}}} > S_{{{\text{NURS}},{\text{ MEDI}}}}^{{\text{M}}} > S_{{{\text{MEDI}},{\text{ NEUR}}}}^{{\text{M}}} > S_{{{\text{MEDI}},{\text{ NURS}}}}^{{\text{M}}} > S_{{{\text{NEUR}},{\text{ NURS}}}}^{{\text{M}}} > S_{{{\text{NURS}},{\text{ MEDI}}}}^{{\text{M}}} ,} \\ \end{array}$$
(7)

where we use for the affinity \({A}_{ij}^{(2)}\) as \({A}_{\mathrm{MEDI},\mathrm{ NEUR}}^{(2)}\fallingdotseq 3.21\), \({A}_{\mathrm{NEUR},\mathrm{ MEDI}}^{(2)}\fallingdotseq 58.3\), \({A}_{\mathrm{MEDI},\mathrm{ NURS}}^{(2)}\fallingdotseq 3.84\), \({A}_{\mathrm{NURS},\mathrm{ MEDI}}^{(2)}\fallingdotseq 62.5\), \({A}_{\mathrm{NEUR},\mathrm{ NURS}}^{(2)}\fallingdotseq 1.74\), \({A}_{\mathrm{NURS},\mathrm{ NEUR}}^{(2)}\fallingdotseq 1.57\) obtained in Section “Data and the method of analysis” (Fig. 3).

Next, we consider (ii) adding the similarity and the affinity and define as another new similarity \({S}_{ij}^{\mathrm{A}}\) as

$$\begin{array}{c}{S}_{ij}^{\mathrm{A}}\equiv {S}_{ij}+{A}_{ij}^{\left(2\right)}/100.\end{array}$$
(8)

Note that 100-fold factor in the second term is introduced to be the affinity treated as appropriate degree for the similarity \({S}_{ij}\), which is in the range from 0 to 1, and \({S}_{ij}^{\mathrm{A}}\) has also bidirectional information, \({S}_{ij}^{\mathrm{A}}\ne {S}_{ji}^{\mathrm{A}}\). Following this definition and calculating another similarity \({S}_{ij}^{\mathrm{A}}\) for the three fields MEDI, NEUR, and NURS as above, we obtain

$${S}_{\mathrm{MEDI},\mathrm{ NEUR}}^{\mathrm{A}}\fallingdotseq 1.00, {S}_{\mathrm{NEUR},\mathrm{ MEDI}}^{\mathrm{A}}\fallingdotseq 1.55,$$
$$\begin{array}{c}{S}_{\mathrm{MEDI},\mathrm{ NURS}}^{\mathrm{A}}\fallingdotseq 0.806, {S}_{\mathrm{NURS},\mathrm{ MEDI}}^{\mathrm{A}}\fallingdotseq 1.39,\end{array}$$
(9)
$${S}_{\mathrm{NEUR},\mathrm{ NURS}}^{\mathrm{A}}\fallingdotseq 0.812, {S}_{\mathrm{NURS},\mathrm{ NEUR}}^{\mathrm{A}}\fallingdotseq 0.810,$$

and thus, we finedwhere we use for the same values of affinity as calculating for Eq. (7).

$$\begin{array}{*{20}c} {S_{{{\text{NEUR}},{\text{ MEDI}}}}^{{\text{A}}} > S_{{{\text{NURS}},{\text{ MEDI}}}}^{{\text{A}}} > S_{{{\text{MEDI}},{\text{ NEUR}}}}^{{\text{A}}} > S_{{{\text{NEUR}},{\text{ NURS}}}}^{{\text{A}}} > S_{{{\text{NURS}},{\text{ MEDI}}}}^{{\text{A}}} > S_{{{\text{MEDI}},{\text{ NURS}}}}^{{\text{A}}} ,} \\ \end{array}$$
(10)

The results of the three different calculations of similarities for the test data up to this point are summarized in Table 1. As mentioned above, when neuroscience and nursing are the fields of interest, the affinities with medicine are high, which also influences the similarity calculations for the sample data conducted in this study. It is also possible to quantitatively reflect the bidirectional nature of affinity (i.e., the difference between which is the field of interest) in the similarity. In our two examples of applying affinity to similarity, we can see the following differences and commonality. \({S}_{ij}^{\mathrm{M}}\), which is affinity multiplied to similarity, strongly reflects the relationship between the size of the affinity and amplifies (strongly weighted by the affinity). On the other hand, \({S}_{ij}^{\mathrm{A}}\) is determined by the balance between the similarity and affinity. This can be seen from the difference in the order of \({S}_{\mathrm{MEDI},\mathrm{ NURS}}^{\mathrm{M}}\), \({S}_{\mathrm{NEUR},\mathrm{ NURS}}^{\mathrm{M}}\), and \({S}_{\mathrm{NURS},\mathrm{ MEDI}}^{\mathrm{M}}\) in Eqs. (7) and (8). The commonality is that the actual values calculated from the conventional similarity Eq. (3) (Eq. (2)) and the relationship among them (Eq. (4)), as described above, have been extended to the values (Eqs. (6) and (9)) and the relationships among them (Eqs. (7) and (10)) calculated from the definition including bidirectionality (Eqs. (5) and (8)), and resulted in a higher resolution of similarity. Furthermore, in the case presented here, reflecting the potential high affinity of neuroscience and nursing for medicine, \({S}_{\mathrm{NEUR},\mathrm{ MEDI}}^{\mathrm{M}}, {S}_{\mathrm{NURS},\mathrm{ MEDI}}^{\mathrm{M}}\) and \({S}_{\mathrm{NEUR},\mathrm{ MEDI}}^{\mathrm{A}}, {S}_{\mathrm{NURS},\mathrm{ MEDI}}^{\mathrm{A}}\) appear as having higher similarity than \({S}_{\mathrm{MEDI},\mathrm{ NEUR}}^{\mathrm{M}}\) and \({S}_{\mathrm{MEDI},\mathrm{ NEUR}}^{\mathrm{A}}\), respectively (note that \({S}_{\mathrm{MEDI},\mathrm{ NEUR}}\) has shown the highest similarity in the conventional relationship, Eq. (4)).

The two new similarities proposed in this study extend the relationships among fields that can be measured by the similarity (doubling the resolution of the similarity) due to the bidirectional nature of the affinity, and also incorporate a feature of the affinity among fields derived from the number of journals. When discussing similarity due to researchers' activities (citations etc.), the conventional similarity \({S}_{ij}\) (Eq. (3)) is used, and when discussing similarity reflecting the field distribution of journals, \({S}_{ij}^{\mathrm{A}}\) (Eq. (8)) is used. And when discussing similarity that most strongly reflects the field distribution of journals, the use of \({S}_{ij}^{\mathrm{M}}\) (Eq. (5)) is appropriate. Thus, it becomes possible to use different methods according to the analyst's intention. In this study, we took the similarity as an example of an interdisciplinary research indicator that is affected by the affinity and showed its quantitative impact. The affinities derived from the field distribution in academic journals and their bidirectionality can be incorporated into other interdisciplinary research indicators to measure them more precisely and from a broader perspective.

Limitations of our analyses and results

In proposing a new concept of affinity between fields, we have demonstrated the idea and calculation using information about the 27 academic fields assigned to journals in Scopus. All academic journals in Scopus are assigned at least one academic field from the ASJC classification system (however, we note that the method of this field classification has not been clarified). It should be noted that these fields we used are not the actual academic fields, but the classifications presented by Scopus.

The ASJC system classifies all fields, except for “General”, into four broader subject areas. In addition, the 27 fields have 334 more detailed subcategories. This classification of these fields was an appropriate size for demonstrating our concept, but it provided a coarser-resolution analysis than we might have achieved using the 334 subcategories. It is possible to calculate affinity in the same manner using these subcategories, but in that case, it would be necessary to consider the problems inherent in ASJC classification system. For example, field classifications are sometimes confusing, and some fields appear to be very similar. For this reason, we did not extend our calculation of affinity values to the 334 subcategories in this paper.

As Scopus assigns ASJC fields to a journal, its classification does not take into account whether the articles published in that journal actually fit into that field. In addition, if a journal is assigned multiple fields, we counted the fields equally; we did not take into account how the collection of articles published in the journal might be biased towards one field.

Scopus is not the only large database of scholarly publications; Web of Science by Clarivate Analytics (previously Web of Knowledge) comprises a suite of databases of citation data in different disciplines. It uses approximately 250 Research Areas to classify content, as well as 22 broad research disciplines in its Essential Science Indicators tool. Other field classification systems include the US National Science Foundation classification system, the Science-Metrix classification system, the University of California San Diego classification system, the Australian and New Zealand Standard Research Classification, and the Chinese Library Classification. Our concept of affinity and the calculation method presented in this paper can be used with other classification systems. However, because the academic fields are different for each classification system, it is not possible to directly compare the affinity values presented in this paper with values based on other classification systems.

We summarize our results

Here, we have proposed a new concept: the affinity between fields in academic research. We define the affinity as the number of journals assigned to two fields, divided by the number of journals assigned to one of those two fields. The affinity should be examined from two perspectives: the affinity of other fields to the field of interest, and the affinity of the field of interest to other fields. To derive the affinity, we have used information on the academic fields of all journals in Scopus, the largest database of peer-reviewed literature, which covers all academic fields and contains journals, conference proceedings and books. All academic journals in Scopus have one or more fields assigned from a portfolio of the 27 fields. With the exception of the field “General”, these fields are all categorized into four larger subject areas. Scopus data and the 27 fields in the ASJC classification system are an appropriate size for demonstrating our new concept of the affinity between fields. Moreover, Scopus data is freely downloadable by anyone, thus making it easier for others to reproduce our results.

Our detailed analyses reveal the affinity between fields. In the Life Sciences, “Immunology and Microbiology” is the field with the highest affinity (28.8) to “Biochemistry, Genetics and Molecular Biology”, which indicates that 28.8% of the academic journals in “Immunology and Microbiology” are also assigned to “Biochemistry, Genetics and Molecular Biology”. Similarly, “Biochemistry, Genetics and Molecular Biology” has the highest affinity (8.40) to “Immunology and Microbiology”. In the subject area of Social Sciences, “Arts and Humanities” has the highest affinity (58.1) to the field of “Social Sciences”, and “Social Sciences” has the highest affinity (35.5) to “Arts and Humanities”. These values are also the highest from the standpoint of each field. In Physical Sciences, “Chemistry” and “Chemical Engineering” have the highest affinity for each other (27.2 and 32.3, respectively), as do “Environmental Sciences” and “Earth and Planetary Sciences” (33.6 and 37.6, respectively). In Health Sciences, “Nursing”, “Dentistry” and “Health Professions” each have the highest affinity to “Medicine” (62.5, 32.1 and 67.2, respectively). The fields with the highest bidirectional affinity are “Arts and Humanities” and “Social Sciences”, and “Earth and Planetary Sciences” and “Environmental Sciences”. Medicine is the field that most often has the highest affinity to the field of interest, securing the top position for seven fields (the first four of which are in Life Sciences): “Biochemistry, Genetics and Molecular Biology” (43.1), “Immunology and Microbiology” (57.5), “Neuroscience” (58.3), “Pharmacology, Toxicology and Pharmaceutics” (43.8), “Nursing” (62.5), “Dentistry” (32.1), and “Health Professions” (67.2).

The affinity plays a weighting role in the indicators calculated from citation relationships etc., for example, as shown in Section “An example of applications”. And this means that the discussion of the degree of interdisciplinary research derived from it will be more reflective of the current relations among fields. When discussing similarity due to researchers' activities (citations etc.), the conventional similarity \({S}_{ij}\) (Eq. (3)) is used, and when discussing similarity reflecting the field distribution of journals, \({S}_{ij}^{\mathrm{A}}\) (Eq. (8)) is used. And when discussing similarity that most strongly reflects the field distribution of journals, the use of \({S}_{ij}^{\mathrm{M}}\) (Eq. (5)) is appropriate. Thus, it becomes possible to use different methods according to the analyst's intention. In addition, by reflecting the bidirectional nature of the affinity, it is possible to give a new bidirectional view to characteristics (e.g., similarity) that have been expressed only in a single direction so far. The concept of the affinity between fields might also be used when researchers (and/or university research administrators) are considering an extension of their research theme and to analyze research trends. And it could be used to help researchers find suitable academic journals for submitting their work. From the perspective of analyzing the research activities of research institutions and other organizations, the introduction of the affinity concept is expected to improve the accuracy of measuring the interdisciplinarity of research, thereby contributing to the validation of the effectiveness of interdisciplinary research and the discussion of research strategy formulation. Concrete applications, drill-down on smaller subcategories, and analysis of other journal subject classifications are subject to further discussions.