Introduction

Given that scientific collaboration is currently the main mode of knowledge production, its value is enormous. Co-authored papers are more frequently cited and have greater societal impact than ever before1,2. Research collaborations not only drive innovation and disciplinary breakthroughs but also contribute significantly to solving complex and important social problems, thus becoming one of the most important forms of knowledge creation and management. Against this background, the driving forces behind collaborations are receiving increasing attention from researchers. Studies have found a significant linear relationship between collaboration diversity, which primarily refers to the degree of variation among team members’ characteristics, and publication in high impact factor journals; collaboration diversity positively predicts citation rates3,4,5. Specifically, collaboration diversity’s ability to increase the visibility of research (through more diverse social networks) may increase academic impact6. If these factors are excluded, it still suggests that research papers with higher collaboration diversity are more popular with peer reviewers7. In today’s rapidly evolving knowledge economy, researchers aim to maximise the benefits of extensive scientific collaborations across different organisations or countries8. However, in the context of increasing team sizes, it remains unclear whether continually increasing diversity always plays a positive role. While diversity promotes complementary skills and knowledge reconfiguration among members, it also increases cognitive costs and affective biases to some extent9, which may reduce the efficiency of collaboration. In the relevant literature, the marginal benefits of collaboration diversity for research performance are yet to be explored in depth. Such an investigation may provide an empirical basis and insights for understanding scientific activity patterns and research team management, and this is the question that the present study seeks to address.

Collaboration diversity primarily refers to the degree of variation among members’ characteristics. Previous studies have examined the relationship between collaboration diversity and research performance in terms of research members’ characteristics, such as institution, ethnicity, and country10,11,12,13. Nevertheless, if we examine collaboration as a research activity based on knowledge interaction, the diversity of research participants in terms of their knowledge backgrounds is particularly crucial. As a typical example of knowledge diversity in research collaboration, interdisciplinary collaboration involves researchers from different disciplines using a problem-oriented approach to conduct academic research through multiple forms of collaboration. This approach allows them to integrate multidisciplinary forms of knowledge and produce innovative research results14. From the perspective of scientific communication, it is the scientific community that brings its specialised knowledge, research logic, and way of thinking into the process of scientific activity and thereby accomplishes a breakthrough in interdisciplinary research.

Prior studies examining the relationship between interdisciplinary collaboration and research performance have mainly used academic impact as an indicator of research performance. This may represent a limitation, given that they excludes civil society and public discourse, which were identified as an essential part of current high-level knowledge production (“Mode 3”)15. The participation of the public activates new ideas for solving problems and likewise accelerated the practice of scientific research. Along these lines, Carayannis and Campbell16 added “civil society” to the “academia-industry-government” model to achieve an ecological balance of innovation in the public interest. They constructed a quadruple helix model of the dynamics of knowledge production, which called for a shift in the focus of knowledge evaluation systems from academic to societal impact. For example, the Research Excellence Framework (REF) in the UK or Moed and Halevi17 advocated a multidimensional evaluation system that separated the impact of science into academic, societal, economic and other. Among them, societal impact broke the limitations of recognition by the academic community only. It emphasized the degree of interest which non-academic workers, such as policy makers and people in society, take in research18. As an important channel of communication and dissemination between academia and society at large, social media facilitated the behavior of interaction between them. Based on that societal impact was generated19. In this context, the quantitative analysis of altmetrics based on social network data has also emerged. It was considered “the creation and study of new social network-based metrics to analyse academic intelligence”20, reflecting the level of public interest in scientific research. To some extent, this provided a measure for societal impact-oriented research evaluation systems21,22. Altmetrics were more useful for uncovering general patterns than qualitative case study research which were widely applied in research on the evaluation of the societal impact of research outcomes23,24.

Only a few studies have explored the relationship between knowledge or disciplinary diversity and societal impact in recent years, and the results have been controversial. Some scholars have found that the usefulness of papers increases in proportion with their disciplinary diversity25,26,27. Conversely, Shi et al.28 divided the highly cited papers in the field of library intelligence into high, medium, and low interdisciplinary groups and found that interdisciplinarity was highly negatively correlated with societal impact. It is important to note that the above-mentioned studies measured the disciplinary diversity of papers through references. They focused on the disciplinary origin and related structure of the research participants, which helped characterise the paths of the knowledge flow29,30. Such studies highlighted interdisciplinary citation patterns; they did not directly reflect the patterns of communication and collaboration between research participants from different disciplinary backgrounds31. Critically, the relationship between disciplinary diversity and societal impact in interdisciplinary collaboration remains to be explored.

In summary, this study proposes to conduct research on the relationship between diversity in interdisciplinary collaboration and societal impact. To address this question, we measured the level of interdisciplinary collaboration diversity using the disciplinary attributes of authors of publications (details in “Methods”) and reflected the societal impact of the co-authored papers through altmetrics.

The relationship between interdisciplinary collaboration diversity and societal impact

Understanding the essence of the relationship between knowledge diversity in interdisciplinary collaboration and societal impact involves investigating the impact of collaboration diversity on research performance32. Most studies suggest that collaboration diversity has a positive effect on research output, which encourages research institutions to actively engage in diverse collaborations and researchers to seek new partners5,11,33,34. However, other studies have found that heterogeneity in team members” knowledge hinders creativity and reduces performance9,35. These inconsistent findings may reflect the fact that this relationship is non-linear.

Theoretically, knowledge diversity among collaborating members is often considered a double-edged sword. On the one hand, in terms of information processing, knowledge diversity challenges existing knowledge structures, stimulates creativity and divergent thinking, and facilitates knowledge reconstruction by identifying and integrating knowledge from different fields9,36. Just as the weak ties hypothesis emphasises the importance of different resources, richer and more diverse information contributes to improved collaborative performance37,38. On the other hand, increasingly heterogeneous knowledge can exacerbate the knowledge boundaries among team members and increase the cognitive costs required for shared knowledge construction39.

Similar trends were observed at the level of team interactions. Highly heterogeneous teams experienced some degree of conflict when collaborating because of their different perceptions of the task40,41. Moderate levels of task conflict are more conducive to complex issue resolution than excessively high or low levels42. Simultaneously, researchers’ pride in their own discipline can prompt protective behaviour in the domain of knowledge. Moreover, the use of technical discipline-specific terms may deepen communication barriers and cause other challenges43. The effectiveness of team communication and decision-making decreases as diversity increases44.

It is clear from the above that as knowledge diversity increases, the advantages do not always outweigh the disadvantages; in terms of diversity, there can be “too much of a good thing”. There is a threshold of positive effects beyond which undesired outcomes are produced, resulting in an inverted U-shaped non-linear relationship45,46. Accordingly, we defined Hypothesis as follows:

The impact of knowledge diversity in the relationship between interdisciplinary collaboration and societal impact forms an inverted U-shape. As diversity increases, societal impact reaches a peak and then tends to decline.

Results

Descriptive statistics and correlation analysis

The results of the descriptive statistics and correlation analysis of the variables revealed significant positive correlations between Tweets mentioned counts and the number of subjects (Table 1). To avoid the influence of outliers on the results, the data were Winsorized to trime the dependent variables at the 99% quantile. The final sample quantity was 1539. Related results showed that the correlation coefficients were below the critical value of 0.7, indicating there were no serious collinearity issues between the variables. Accordingly, we proceeded with further hypothesis testing and regression analysis.

Table 1 Descriptive statistics and correlation coefficients.

Regression analysis

Given that the variance of the dependent variable was larger than the mean and showed an over-dispersed distribution (i.e. it did not meet the requirements of a general multiple linear regression), we modified the model using a negative binomial regression applicable to asymmetric datasets. The variance inflation factor (VIF) for each variable was less than 10, excluding the possibility of multicollinearity between the independent and control variables. The regression Eq. (1) is as follows:

$$TMC={\beta }_{0}+{\beta }_{1}\left(\mathrm{Subject count}\right)+{\beta }_{2}\left({\mathrm{Subject count}}^{2}\right)+{\beta }_{3}\left(\mathrm{Organization count}\right)+{\beta }_{4}\left(\mathrm{Author count}\right)+{\beta }_{5}\left(\mathrm{Journal}\right)+{\beta }_{6}\left(\mathrm{Time lag}\right)+{\beta }_{7}\left({\mathrm{Fields}}_{\mathrm{PSE}}\right)+{\beta }_{8}\left({\mathrm{Fields}}_{\mathrm{LES}}\right)+{\beta }_{9}\left({\mathrm{Fields}}_{\mathrm{BHS}}\right)+{\beta }_{10}\left({\mathrm{Fields}}_{\mathrm{SSH}}\right)+{\beta }_{11}\left({\mathrm{Fields}}_{\mathrm{MCS}}\right)$$
(1)

Model 1 showed the regression results for all the control variables on societal impact. Models 2 and 3 tested the regression results for the linear term and the quadratic term for the societal impact on diversity in interdisciplinary collaboration, respectively (Table 2). The 95% confidence interval for the alpha value of the negative binomial regression model did not include zero, indicating that the model rejected the original hypothesis of “alpha = 0” at the 5% significance level; hence, the negative binomial regression model was basically acceptable47.

Table 2 Regression analysis results.

As seen in Fig. 1a, the fitted curve for Tweets mentioned count to subject count showed an inverted U-shaped trend. This study adopted the AIC and BIC statistic to determine the fit of the quadratic model versus the linear model. When comparing the two, a smaller AIC or BIC means a better model fit. It was found that the quadratic model revealed more patterns in the data than the linear model (See Models 2 and 3 of Table2 for detailed results). Likewise, the Pseudo R2 data for the three models supported that the model fit was better for the quadratic terms.

Figure 1
figure 1

(a) Scatter plot and fitted curve of number of Tweets mentioned count vs. subject count (b) Regression function graph (c)Average marginal effects of subject count with 95% CIs.

The results showed that the linear term for knowledge diversity in interdisciplinary collaboration positively predicted the number of Tweets, while the regression results for the quadratic term were negative and significant (β1 = 0.099, p < 0.001, β2 = − 0.006, p < 0.001), thus supporting Hypothesis. To further test the inverted U-shaped hypothesis, we adopted the utest command in Stata and found that the extreme value points were within the data limits (1 < 6.62 > 22), and the 95% confidence interval was [4.745, 8.892]. As such, the null hypothesis of no U-shape could be rejected at the 5% significance level (t = 3.85, p < 0.001). The trends in collaboration diversity and societal impact are plotted against the negative binomial regression results in Fig. 1b.

Figure 1c showed a further analysis of the marginal effects. It was found that as the diversity of interdisciplinary membership increases, the marginal effect on societal impact continues to diminish and even becomes negative. At an average number of disciplines of 6.6, the equilibrium point between costs and benefits is reached. However, the costs then begin to outweigh the benefits and the slope becomes negative.

Robustness tests

To verify the regression results, we conducted robustness tests using another important measure of societal impact: altmetric attention score (Table 3 and Fig. 2a,b). Since altmetric attention score was a non-integer data, it was not rigorous to employ negative binomial regression analysis. This study adopted boxcox transformed dependent variable data for OLS regression analysis. The results showed that the linear term for collaboration diversity positively predicted altmetric attention score, while the regression results for the quadratic term were negative and significant (β1 = 0.162, p < 0.01, β2 = − 0.012, p < 0.01).

Table 3 Regression analysis results.
Figure 2
figure 2

(a) Scatter plot and fitted curve of number of altmetric attention score vs. subject count (b) Regression function graph (c)Average marginal effects of subject count with 95% CIs.

To further test the inverted U-shaped hypothesis, we adopted the utest command in Stata and found that the extreme value points were within the data limits (1 < 6.823 > 22), and the 95% confidence interval was [5.045, 9.865]. As such, the null hypothesis of no U-shape could be rejected at the 5% significance level (t = 3.18, p < 0.001). Furthermore, the results of the marginal effects analysis similarly indicated that the instantaneous slope of subject diversity and societal impact shifted from positive to negative. This suggested that when the number of disciplines exceeded the optimal size, the marginal costs outweigh the marginal effects and the negative impact becomes dominant (See Fig. 2c).

Discussion and conclusion

Main findings

Grounded in the question of whether diversity in collaboration is always optimal, this study explored the relationship between collaboration diversity and societal impact in interdisciplinary research, and the role of cognitive distance. Based on a literature review and theoretical analysis, we suggested that this relationship may take the form of an inverted U-shaped pattern. We used a sample of co-authored studies published in Nature and Science to test the hypotheses. The results indicated a significant inverted U-shaped relationship between the number of cross-disciplinary collaborators and both the number of times the paper was mentioned on Twitter and the altmetric attention score. This implied that as the knowledge diversity of the collaboration members increases, societal impact tends to decline after it reaches a certain peak.

Current mainstream research has dissected the benefits of collaboration diversity from a variety of perspectives. Philosophical explanations can be traced back to Aristotle’s famous defence of the “wisdom of the multitude”: “Hence the many are better judges than a single man of music and poetry; for some understand one part, and some another, and among them they understand the whole”48. This emphasises the role of diversity in facilitating collective decision-making. Furthermore, cognitive neuroscience experiments have shown that the proximity of other people’s ideas effectively enhances interpersonal brain synchronisation, thereby increasing team creativity and vice versa49. Contrastingly, some meta-analytic studies have found small or zero effect sizes for the positive relationship between demographic or cultural diversity and team performance50,51. Hence, it is evident that logical deduction and empirical studies continue to note both the advantages and disadvantages of collaboration diversity. This controversy may indicate that the advantages of collaboration diversity have their own specific scope and boundary conditions52. This study confirms, from an intellectual context, that the benefits of cooperative diversity may be reduced after a certain peak due to excessive costs. In scientific research, where divergence and convergence processes alternate, diversity is essential if scientists are to produce novel and rigorous results53; it is also important to pay attention to the cost of too much diversity and avoid inefficiency and the phenomenon of “too many cooks spoiling the broth”.

It is interesting to note that this finding is similar to the optimal scale effect found in recent years in scientific collaboration. Price’s famous prediction54 that scholarly publications will “move steadily toward an infinity of authors per paper” is being borne out, with the “scientist-as-lonely genius” myth becoming further detached from reality55,56. In fact, the benefits of team size were shown not to follow a linear growth pattern. With the continued “inflation” of collaboration size, citation rates appeared the tendency of reduction, which presented an inverted U-shaped relationship7,57,58. As Wu et al.13 found, large teams tend to develop existing science and technology more than small teams, resulting in fewer disruptive innovation breakthroughs. Increased team size is often accompanied by greater diversity among team members58, introducing more collective intelligence and innovative perspectives. There are exceptions, such as very diverse small groups and large, highly homogenous teams. The key question is whether size or diversity is more important in the relationship between interdisciplinary collaborative research and research performance. Collective intelligence research has focused on this relationship, with Condorcet’s jury theorem predicting that heterogeneous team composition may be more important when faced with highly complex decisions which are susceptible to bias59. To improve the accuracy of collective decision-making, a high level of diversity needs to be maintained if team size is to be continuously increased60. Despite controlling for the variable of team size, the present study still found an inverted U-shaped distribution in the relationship between collaboration diversity and societal impact, indicating the relatively greater importance of diversity in collective decision-making performance compared to team size. Zhu et al.58 discovered that research team members’ own research diversity played a moderating role in the relationship between team size and the influence of the research. In any case, our conclusions need to be interpreted with caution, as the relationship between size and diversity may be reversed under other conditions. For example, in real-world social activities, the performance of collective decision-making depends on a variety of factors, such as average individual accuracy and decision bias59,61.

Implications

In theoretical terms, this study focused on the performance of collaboration diversity from the perspective of knowledge diversity in interdisciplinary collaboration. As an important cognitive basis for scientific collaboration, knowledge diversity among team members is a key element that influences innovation performance. In contrast to previous studies that have emphasised the value of collaboration diversity, this study quantitatively verified that collaboration diversity is a double-edged sword, peaking in the number range of 6 to 7, after which its negative effects may outweigh the positive ones.

In current society, the promotion of scientific and technological innovation through interdisciplinary collaboration has become a major concern for national governments and research institutions. This study provides important insights into the formation of research teams and knowledge management. It is important to be aware of the differences in the disciplinary backgrounds among team members and to try to optimise the level of disciplinary diversity. During collaboration, research teams, especially those working at the intersection of the humanities and sciences, need to be aware of potential communication barriers and conflicts among members.

Limitations and future research

Note that while research on societal impact has widely used altmetric as indicators, we are aware of the limitations of measuring societal impact through analysis of social media activity. The findings of this study need to be validated with multiple sources of data and research methods. For example, societal impact is not only measured by the level of attention but also by the content. It is necessary to explore metrics for quantifying the valance of social evaluation. Furthermore, in addition to diversity and differences in disciplinary distances, interdisciplinary collaboration also involves disparity and balance. It is necessary for future research to reveal the patterns in interdisciplinary collaboration from more dimensions.

As a major part of scientific activity, scientific collaboration is a dynamic and changing process of cognitive interaction comprising a comprehensive and complex collection of intertwined factors, such as member attributes and interactions. This study only focused on the relationship between the disciplinary diversity of team members and the outcome of collaboration which reflected in societal impact. It remains to be seen how disciplinary diversity affects the innovation performance of collaboration and how organisations can respond to the “marginal dilemma” identified in the present study. Future research should explore how to effectively control the decline in costs and creativity, while maximising team members’ innovative behaviour.

Methods

Data

Nature and Science are multidisciplinary journal, and choosing papers from the same journal in the same year would exclude the effect of different impact factors and years of publication. Thus, we conducted a search of the Web of Science Core Collection which was limited to the journal Nature and Science, the publication date “2018” and the literature type “Article”, giving a total of 1596 articles. The data collection process was as followed. First, the basic information of each article was obtained, such as the DOI, authors, institutions, publication date, and so on. Second, using the DOI of each article, the number of mentions on Twitter and the overall altmetrics score was obtained from altmetric.com. The above data were crawled with the help of the altmetrics package for Python (version 3.6.6) for 3 August 2022.

To avoid missing data and ensure the matching of information in the literature, the following papers were excluded: retracted papers; papers with missing information, such as authors’ institutions; papers that was missing altmetrics data; single-author papers; or papers with unclear information on the discipline covered by the primary or secondary institution. The final valid sample comprised 1554 papers.

Variables

The variables used in this study were defined in Table 4. The dependent variable was societal impact. The independent variable was diversity in interdisciplinary collaboration. The control variables were the date of publication, institutions, and number of authors. The specifics of societal impact, interdisciplinary collaboration diversity, and control variables were detailed below.

Table 4 Measures of variables.

Societal impact

The dependent variable was the societal impact of each paper. To measure it, we adopted the main altmetrics indicator, namely the number of mentions on Twitter62. Twitter, as a major public social media platform, broadens the audience and dissemination channels of academic research findings and helps to improve the understanding and application of academic research by the public63.

Interdisciplinary collaboration diversity

Referring to the studies of Zhang et al.31 as well as Zhang and Zhang64, we obtained the number of disciplines involved in each study to measure the diversity of interdisciplinary collaboration from a scientific activity perspective, starting with information on each author’s institution. Data processing was conducted as followed (Fig. 3). First, the complete address of each institution was extracted, and the characteristic words of the secondary institution were identified, which included information on the discipline. If the secondary institution did not show subject information, the characteristic words of the primary institution were identified. Second, a disciplinary classification scheme was constructed based on the 254 disciplinary categories of the Web of Science and their corresponding five scientific disciplines (Life Sciences & Biomedicine, Technology, Arts & Humanities, Social Sciences, and Physical Sciences). In addition, the subject information words of each institution were matched to the subject categories in the subject classification scheme. Non-English words or ambiguous abbreviations and acronyms were manually checked to ensure accuracy. Following this method, we were able to count the number of different disciplines involved in each publication.

Figure 3
figure 3

Procedure for measuring knowledge diversity in interdisciplinary research.

Control variables

This study controlled for the number of authors, institutions and the type of journal; it was suggested that these variables have a potential correlation with the academic impact of scientific papers30. Considering the real-time nature of altmetric data facilitated by online research communication, we obtained the difference between the date of publication and obtaining altmetric data. This helped control for the effect of the differences in the times of publication of the papers in the sample. Due to the wide variation in the societal attention to research on different topics65, this study used subject classification data from the Leiden ranking, i.e., flagging one or more of the main fields covered by the publication. The fields involved were respectively: Biomedical and health sciences, Life and earth sciences, Mathematics and computer science, Physical sciences and engineering and Social sciences and humanities. We included in our analysis the coverage of each of these fields.