Introduction

Policy actors see science, more specifically scientific knowledge, as a valuable instrument in giving numerous opportunities to a country concerning its innovation and economic development. Studies have suggested that scientific knowledge, generated through research activities and disseminated in publications, is positively associated with economic growth (e.g. Inglesi-Lotz & Pouris, 2013; Inglesi-Lotz et al., 2014; Ntuli et al., 2015). Given the role of science, several policy instruments have been developed and implemented to foster the RIS (Boekholt et al., 2009). Well–known instruments are those aimed to increase RC, mainly in an international environment (Boekholt et al., 2009).

RC is likely to provide scientists and consequently the RIS where they belong to with numerous advantages. Among the many benefits of RC are: access to tacit and codified knowledge and to costly and complex equipment, the opportunity to expand and diversify research networks, the chance to explore new scientific problems and obtain funding (e.g. Katz & Martin, 1997; Melin, 2000; Owusu-Nimo & Boshoff, 2017; Thorsteinsdottir, 2000; Zdravkovic et al., 2016). Another benefit frequently addressed by the literature is the increasing citation impact, as measured by the number of citations received, of scientific knowledge (throughout the paper, I use the terms citation impact, scientific impact, the impact of the scientific articles, and the impact of scientific knowledge interchangeably). The empirical literature has shown that the impact, as measured by citations, of publications resulting from RC, is likely to be higher than that of publications without collaboration (e.g. Beaver, 2004; Frenken et al., 2010; Sooryamoorthy, 2009; Van Raan, 1998). The studies examining the relationship between RC and impact have been done at the: micro–level (individual publications or individuals scientists), meso–level (universities), and national–level (countries) (e.g., Aldieri et al., 2018; Didegah & Thelwall, 2013b; Katz & Hicks, 1997; Lee & Bozeman, 2005; Leydesdorff et al., 2019; Peters & van Raan, 1994; Puuska et al., 2014) and controlling for other factors that might also influence the scientific impact of knowledge. Such factors might be the attributes of the publications and authors (e.g., Bornmann et al., 2012a, 2012b; Didegah & Thelwall, 2013b; Onodera & Yoshikane, 2015) or the expenditure on research and development (R&D) (Bordons et al., 2015; Chinchilla-Rodriguez et al., 2019; Leydesdorff et al., 2019) considered in some studies as a proxy for scientific capacity (e.g., Chinchilla-Rodriguez et al., 2019).

In this paper, I have examined the effect of RC on the impact of the articles of the countries that constitute the EU–27 plus The United Kingdom (UK), Norway, Switzerland, Serbia, Turkey, Iceland, North Macedonia, Ukraine and Israel taking into account that this effect is related to the development of the RIS of each country involved in the collaboration and also controlling for other variables of interest. I contribute to the literature on this stream in several ways. First, differently from the extant literature on the area (e.g., Bordons et al., 2015; Chinchilla-Rodriguez et al., 2019), I have looked at the effect of RC on scientific impact of countries considered to be modest innovators, moderate innovators, strong innovators, and innovation leaders considering the interaction of RC with the level of development of the RIS of these countries. As for RC, I have disaggregated the information and considered five types of collaboration—DRC and collaborations between countries with RIS at different levels of development—and estimated the incremental effect for a given type of collaboration taking into account different baseline collaborations. Second, in contrast to the literature that has used the expenditure on R&D as a proxy for scientific capacity (e.g., Bordons et al., 2015; Chinchilla-Rodriguez et al., 2019), I have used a more comprehensive indicator to describe the development of the RIS of a country, the Summary Innovation Index used in the European Innovation Scoreboard (EIS). This allows for a better characterisation of the RIS as it takes into account the main drivers of innovation; the investments in the public and private sectors, the different aspects of innovation in the business sector and the effects of firms' innovation activities (Hollanders et al., 2020). Third, I have included another variable that has been shown to affect citations of publications, but which has not been explored in the presence of RC at the time of this study: the fact that publications may be open access. Fourth, I have demonstrated the applicability of fractional regression models, which are widely applied in econometrics, in the scientometric field.

The paper is organised as follows: in the next section, I review the literature that discusses the influence of RC and other variables on scientific impact and present the framework supporting the formulated hypothesis. The section methodology includes the description of the dataset, variables, and specification of the model. In the following section, it is presented and discussed descriptive statistics and the regression results. The final section summarises the main findings and limitations of the study and suggest some future research.

Literature review and hypothesis

RC is a complex phenomenon due to the nature of human interactions among scientists who have been collaborating (Subramanyam, 1983). Thus, there is no single concept of RC. Katz and Martin, in their attempt to define RC, ended up with suggestions of criteria for identifying collaborators (Katz & Martin, 1997). Bozeman and Boardman defined RC as “social processes whereby human beings pool their experience, knowledge and social skills with the objective of producing new knowledge, including knowledge as embedded in technology” (Bozeman & Boardman, 2014).

It has been claimed that RC allows scientists to produce knowledge with higher quality (Melin, 2000). Indeed, several empirical settings have shown that papers with the participation of diverse scientists attract more citations (citations have been frequently used as a proxy for quality) than single-authored papers (e.g. Beaver, 2004; Frenken et al., 2010; Puuska et al., 2014; Sooryamoorthy, 2009; Van Raan, 1998). Why does RC lead to higher quality work?

Without claiming to be exhaustive, I describe some scenarios in which collaboration may foster outputs with results of higher quality.

Scientific problems are becoming more complex because scientists need to deal with more and more data–rich and computationally intensive projects (the societal challenges are examples of such projects). Most of these problems call for interdisciplinary research as it allows for holistic integrative approaches from a variety of disciplines (Morillo et al., 2003; NSF, 2009). It is, therefore, necessary to combine the skills and expertise of different scientists. This combination will promote a more rigorous review of research, thus increasing the quality of the final outputs.

The knowledge shared by scientists in the form of publications is essential for the progress of science. However, tacit knowledge can not be learned through publications. Here, collaboration plays an important role, as this type of knowledge can be better transferred and combined through this mechanism (Gertler, 2003; Storper & Venables, 2004). Therefore, we expect research results of higher quality.

PhD students usually do not have sufficient experience, context and knowledge to conduct independent research in the early stages of their training (Bozeman & Corley, 2004). Therefore, their collaboration with senior scientists (e.g., in the form of mentoring) is essential to achieve high–quality results.

The exchange of knowledge and regular discussions among scientists about the work can enable the identification of problems that were not considered when planning the research pathway, but which are important for improving the quality of the research activities.

From the previous points, I anticipate that:

H1

Collaborative activities lead to scientific knowledge of higher impact than non–collaborative activities.

As far as the spatial level is concerned, RC has been mainly studied at the domestic and international scales. DRC is usually defined as research activities between scientists working at different institutions located in the same country, while IRC includes research activities between scientists working in different countries (Katz & Hicks, 1997).

Various studies have shown that papers with IRC tend to receive more citations than those with DRC (e.g., Adams, 2013; Katz & Hicks, 1997; Potter et al., 2020). This is understandable, as research with scientists from another country significantly increases the likelihood of a complementary knowledge base compared to DRC and therefore the quality of the scientific knowledge. On the other hand, at this level, the geographical reach of the dissemination of the knowledge generated may increase, as knowledge can flow through the different networks of co–authors (Goldfinch et al., 2003) and therefore the likelihood of being used and cited.

However, I believe that the impact of IRC and DRC depends on the development of the RIS. RIS with a long research tradition and therefore high accumulated scientific knowledge, well–equipped scientific facilities, stable and well–developed research networks and rich human capital in R&D, will benefit less from IRC. In this situation, there are more opportunities to find a partner with complementary knowledge and other resources (e.g., scientific equipment) within the borders (Frame & Carpenter, 1979; Narin et al., 1991). The dissemination of new knowledge will also be higher if we take into account that there are well-established national and international networks of scientists in these countries, resulting from their central position in the network of international research and the large number of countries with which they are involved in joint publications (Chinchilla-Rodríguez et al., 2018; Vieira & Cerdeira, 2022; Vieira et al., 2022). If the IRC offers only a small increase in knowledge compared to DRC, scientists are likely to opt for DRC. The costs and risks of IRC are higher than for DRC due to differences in institutional governance, culture, and geographical distance that might separate collaborators (e.g., Hoekman et al., 2010; Vieira et al., 2022b). This might justify the lower rates of mobility and international collaboration compared to countries with less developed RIS (Chinchilla-Rodríguez et al., 2018). The high overlap of the interquartile ranges of the category normalised citation impact for each collaboration type as regards the United States of America and the UK in the study of Potter et al. 2020 is a first indication of the veracity of the above arguments. However, we should consider that the effect of IRC on the scientific impact of a country i also depends on the RIS of the collaborating country. If the collaborating country has a RIS more developed than country i, then we should expect a higher effect on the scientific impact of IRC than of DRC for country i. A question that inevitably emerges from the previous statement is: Why do the scientists working in countries with well–developed RIS collaborate with scientists in countries with less–developed RIS? The answer to this question can be found in the internationalization of the RIS, namely the promotion of institutions of higher education abroad, the need of tackling societal issues and challenges with research, and the necessity of maintaining good and stable diplomatic relations (Boekholt et al., 2009).

Therefore, we expect that:

H2

if the scientists of a country with a well–developed RIS collaborate with scientists from a country with a less–developed RIS, then the scientific impact of these publications is lower than the impact of the publications with DRC of the country with a well–developed RIS.

H3

if the scientists of the country with a less–developed RIS collaborate with scientists from a country with a higher developed RIS, then the scientific impact of these publications is greater than the impact of the publications with DRC of the country with a less–developed RIS.

At this point, we have discussed the role of research collaboration on the impact of scientific knowledge. However, studies have shown that the impact may depend on other factors: scientists’ scientific and non–scientific motivations (Baldi, 1998; Merton, 1973; Shadish et al., 1995; Vinkler, 1987), papers’ attributes as the number of authors, number of pages, number of references, journals’ prestige, accessibility, quality and novelty, scientific discipline, type of paper (e.g., Basson et al., 2021; Bornmann et al., 2014; Breugelmans et al., 2018; Buela-Casal & Zych, 2010; Chen, 2012; Craig et al., 2007; Didegah & Thelwall, 2013a, 2013b; Eysenbach, 2006; Gargouri et al., 2010; Haslam & Koval, 2010; McCabe & Snyder, 2014; Moed, 2007; Peng & Zhu, 2012; Peters & van Raan, 1994; van Dalen & Henkens, 2001; Vieira & Gomes, 2010) and authors’ networks positions (Biscaro & Giupponi, 2014; Colebunders et al., 2014; Miranda & Garcia-Carpintero, 2018; Peters & Vanraan, 1994; Tahmooresnejad et al., 2021; Uddin et al., 2013; Yan & Ding, 2009; Zhang et al., 2021).

The scientists’ scientific motivations are related to the normative theory of citing behaviour as defined by Merton, 1973. This theory assumes that authors select papers to cite based on their intellectual relevance. Scientists’ non–scientific motivations are associated with the social constructivist view of citing behaviour (Baldi, 1998; Case & Higgins, 2000; Gilbert, 1977; Shadish et al., 1995; Vinkler, 1987). According to this view, scientists choose to cite a particular paper because it was published in an important and respected journal, the paper was written by a widely known, respected author(s) with absolute professional recognition, the professional connection is maintained with the cited author (s) or one wishes to establish such a connection, and to convince the audience in a scientific community to share their opinions among other reasons.

The length of the paper may affect the scientific impact as longer papers may contain more findings that contribute to the advancement of knowledge. On the other hand, I believe that journal editors and reviewers will only permit papers that are longer than the threshold established if the contribution is relevant. There are some prestigious journals (e.g., Nature and Science) whose papers are usually short but highly cited, so the previous arguments do not apply in this situation. However, we should bear in mind that these papers are often supported by supplementary material with a high number of pages. The literature looking at the association between the impact of scientific knowledge and the size of the paper has shown mixed results: positive or no association between the two variables (e.g., Didegah & Thelwall, 2013b; Gargouri et al., 2010; Peng & Zhu, 2012; Peters & van Raan, 1994; van Dalen & Henkens, 2001; Vieira & Gomes 2010).

The larger the list of references, the greater the impact of scientific knowledge can be. There are at least two reasons for this argument: the intellectual content and citation–based searching engines. Long reference lists may be associated with interdisciplinary research (the need to integrate knowledge from different scientific disciplines), and this may increase the likelihood that a paper will be cited (Chen et al., 2021; Lariviere & Gingras, 2010). The greater the diversity of scientific disciplines cited, the greater the community that may be interested in. Several algorithms have been developed (such as those in the Web of Science or Scopus) that allow identifying the number of citations that a paper can attract and the respective citing papers. Therefore, it is likely that papers with long reference lists will be found multiple times by these algorithms. Of course, this is not the reason why the paper will be cited more often, but it makes the paper more visible to the scientific community, which ultimately decides whether to cite it or not. In the literature addressing the relationship between the number of references and the scientific impact, a positive correlation between the two has been found (e.g. Bornmann et al., 2014; Chen, 2012; Didegah & Thelwall, 2013a; Haslam & Koval, 2010).

The number of authors in a paper can influence its impact due to multidisciplinary teams and network effects. First, the larger the number of authors, the more likely a multidisciplinary team, thus the paper may attract the attention of scientists working in a wide range of scientific disciplines. Finally, the communication of scientific knowledge is not restricted to journals; the same scientific knowledge can be presented at several conferences for example, especially when multidisciplinary teams are involved, which may also help to increase its visibility. Second, papers with a high number of authors are likely to be cited by a large network of colleagues (Valderas, 2007).

In addition to the number of authors, the authors’ positions in the collaboration network can also influence the scientific impact. The more central the authors of a particular paper are in the collaborative network to which they belong to, the greater the possibilities for knowledge dissemination. The more scientists are directly or indirectly (colleagues of directly connected scientists) connected to the paper’s authors, the higher the probability that a paper will receive more citations. Therefore, it is not surprising that several studies have demonstrated a positive association between authors' position in the collaborative network and scientific impact (e.g. Biscaro & Giupponi, 2014; Tahmooresnejad et al., 2021; Uddin et al., 2013; Yan & Ding, 2009; Zhang et al., 2021).

The open access (OA) citation postulate is based on the idea that for two articles with the same quality the one that is OA may receive more citations than the non–OA because it is available to a large audience (Gargouri et al., 2010). Access to articles that is not limited by a “paywall” allows for large audiences since the internet is available (Harnad et al., 2008). However, the studies addressing the subject have shown contradictory and inconclusive findings as regards the effect of OA on the impact of scientific knowledge. Some studies verified the existence of the OA citation postulate, while others did not confirm this (e.g., Basson et al., 2021; Breugelmans et al., 2018; Craig et al., 2007; Eysenbach, 2006; McCabe & Snyder, 2014; Moed, 2007).

A prestigious journal signals to the readers that idea of high quality. In some fields, high–quality articles tend to be sent first to prestigious journals, and if rejected, they are submitted to a journal with lower impact (Oster, 1980). The higher the quality of the editorial board, the higher the expected prestige of the journal because editors have proven credentials in most cases. Also, they are in part responsible for the outputs of their journals, thus they have strong motivations to perform good quality control services. The literature has shown that scientific impact is associated with the prestige of the journals (e.g., Callaham et al., 2002; Didegah & Thelwall, 2013a; Peters & van Raan, 1994).

Citation culture varies across scientific disciplines and over time (Dorta-Gonzalez et al., 2014; Moed et al., 1985). The frequency with which papers cite others varies across scientific disciplines (Moed, 2010; Zitt & Small, 2008). In Biology & Biochemistry, the average number of references per paper is higher than in Mathematics (Vieira & Gomes 2010), so it is to be expected that the number of citations is higher in the first scientific discipline.

Finally, documents of the type review evaluate results and approaches from several scientific fields and consider the state of the art in each scientific field which could explain why this type of document is more frequently cited than original articles and other types of documents (Colebunders et al., 2014; Miranda & Garcia-Carpintero, 2018; Peters & van Raan, 1994).

Given the multiple variables that may influence the scientific impact, I studied the influence of RC looking at the interaction of this variable with the development of the RIS of each country, and articles features (the number of authors, pages and references, the journals prestige and the accessibility of the documents) on citation impact.

Variables like those related to persuasion and novelty of scientific knowledge are intangible and unmeasurable (persuasion) or difficult to measure (novelty), so I have not examined the influence of these dimensions. Some measures have been developed to measure novelty based on citation and text data (Shibayama et al., 2021; Uzzi et al., 2013; Wang et al., 2017). However, it has been shown that these measures are not suitable for measuring novelty (Bornmann et al., 2019; Fontana et al., 2020). Novelty as the first appearance of a knowledge combination captures the main structural properties of the citation network and it is difficult to distinguish novel from non-novel articles (the novelty measure defined in Wang et al., 2017), while novelty as an atypical knowledge combination overlaps with interdisciplinarity (the novelty measure defined in Uzzi et al., 2013). As for the measures based on citation and text data simultaneously (Shibayama et al., 2021), the limitations are due to the lack of publicly available word-embedding libraries for various scientific fields. We may be able to find these libraries for some scientific fields, but not for a large number. On the other hand, the validation process of this measure has shown a weak correlation with self-reported novelty scores from a questionnaire survey, although the correlations are positive and significant.

An additional variable could be the position of the authors of a document in the collaboration network. However, I excluded this variable due to the complexity associated with its calculation (author’s name disambiguation). Therefore, these limitations of the study should be taken into account in the interpretation of the results.

Methodology

Data

I have studied the influence of RC on the impact of scientific knowledge of the publications classified as articles, (if an article received other document type classification, it was not considered) considering all Web of Science categories, published by the scientists working in countries belonging to the EU–27 plus the UK, Norway, Switzerland, Serbia, Turkey, Iceland, North Macedonia, Ukraine and Israel. The additional countries (those not belonging to EU–27) have been chosen because we have information in the EIS 2019 about the development of the RIS. The publications of these countries are the observations used in the analysis presented below.

I have retrieved the articles published in 2017 (361 616 articles) by the scientists of each country from the Web of Science Core Collection (WoS). Then, for each article, I have identified the type of collaboration, and in the case of IRC, I have chosen only those articles representing a collaboration between scientists of two of the countries considered (this choice is related to the availability of information regarding the development of the RIS in the ESI). Also, for each article, I have looked at the number of pages, number of references, number of authors, quartile that the journal in which the article had been published belongs to, and accessibility of the article (OA or non–OA).

Variables and model specification

Dependent variable

The dependent variable is the citation impact, and it is represented by the percentile (percentile) occupied by each article in the respective Web of Science subject area. The higher the percentile, the higher the impact of the article. As the number of citations depends on the citation culture of each scientific discipline, the type of document and the publication year, I have selected a normalized indicator to represent the citation impact. The percentile, available through InCites, of a publication, is determined by creating a citation frequency distribution for all the publications in the same year, subject category and of the same document type, and then the percentage of papers at each level of citations.Footnote 1 The citation window is an open window, i.e. the count of citations represents the cumulative citations between the publication year and March 2022. It has been argued that the number of self–citations increases with the number of authors or with RC (Glanzel & Thijs, 2004; Van Raan, 1998). However, it has been concluded that the increase in foreign citations is much higher than that of self–citations (Costas et al., 2010; Glanzel & Thijs, 2004). On the other hand, it is common practice for scientists to follow a given research line taking into account their previous research. Considering this and the fact that removing self–citations is costly, time–consuming and computationally complex, I have considered this type of citation. As mentioned in the previous section, the citation culture varies among scientific disciplines and over time. Also, the scientific impact is related to the type of document. Therefore, I am controlling to some extent for the influence of these factors when using a normalized indicator and considering only documents of the type of article.

Other normalized indicators could be used to measure the scientific impact; the most known are those that use the citing–side or cited–side normalization approaches and that are based on average values (Bornmann, 2020; Bornmann & Marx, 2015; Waltman, 2016). The literature comparing the indicators based on these two approaches has led to contradictory findings: Leydesdorff et al., 2013 observed that cited-side normalization outperforms citing-side normalization, Waltman & van Eck, 2013 and Bornmann & Marx, 2015 concluded the opposite. Given the contradictory findings and the fact that indicators that compare the number of citations with the mean expected number of citations may be influenced by one or a few publications with a very large number of citations (the citations distribution is highly skewed (Vieira & Gomes, 2010)), I have chosen to use percentile indicators.

Independent variables

In understanding the influence of RC, I considered for each article the development of the RIS of the involved countries (EIS), the number of pages (pages), references (references), and authors (authors), the accessibility (access), the quartile (quartile) to which the journal where the published article belongs to according to the Journal Impact Factors (JIF), and the type of collaboration (collaboration). As for the JIF, the literature is rich in discussing the limitations of this indicator (Bornmann et al., 2012a, 2012b). An interesting indicator that overcomes many of the caveats of the JIF is the recently introduced Journal Citation Indicator (JCI), a field-normalised indicator. In future studies, it will be interesting to use this indicator, although I do not expect it to behave significantly differently, as there is a 67% overlap between the sources in the first quartile according to the JIF and JCI.

As for the development of RIS, I have used as a proxy the classification of each country in EIS 2019 (the most recent report at the time I have started this study) for the year 2017. The annual EIS provides a comparative assessment of the research and innovation performance of EU Member States through a composite indicator (Summary Innovation Index) that takes into account the main drivers of innovation, public and private sector investment, the different aspects of innovation in the business sector and the impact of business innovation activities (Hollanders et al., 2020). In short, the indicator assesses the relative strengths and weaknesses of national innovation systems and helps countries identify areas they need to address. Four main types of indicators are considered in the calculation of the Sumamry Innovation Index—Framework conditions, Investments, Innovation activities, and Impacts (Hollanders, 2019). Framework conditions capture the key drivers of innovation performance, taking into account human resources, the attractiveness of research systems and the environment in which enterprises operate measuring, for example, the extent to which individuals pursue entrepreneurial activities when they see new opportunities arising from innovation. The Investments capture investments made in the public and business sectors, looking at the availability of finance for innovation projects by Venture Capital Expenditures, government support for research and innovation activities through R&D expenditures in universities and government research institutions, and the firms investments in R&D, non-R&D and efforts to improve the ICT skills of their employees. Innovation activities capture the share of firms that have introduced innovation into the market, collaboration efforts between innovative firms, research collaboration between the private and public sectors, extent to which the private sector funds public R&D activities, and intellectual property rights generated. Finally, Impacts capture the effects of firms' innovation activities on employment and sales. The list of 27 indicators, not described here due to space limitations, and their calculation can be found in Hollanders, 2019.

The variable EIS is equal to 1, 2, 3, or 4 if the country is classified as a modest innovator (Bulgaria, Croatia, Macedonia, Romania, Turkey, Ukraine), moderate innovator (Cyprus, Czechia, Estonia, Greece, Hungary, Italy, Latvia, Lithuania, Malta, Poland, Portugal, Serbia, Slovakia, Slovenia, Spain), strong innovator (Austria, Belgium, France, Germany, Iceland, Ireland, Israel, Luxembourg, Norway, UK), and innovation leader (Denmark, Finland, Netherlands, Sweden, Switzerland), respectively.

Access is 1 if the article is OA and 0 otherwise.

Quartile is 1 if the article was published in a journal that belongs to the first quartile, given the year of publication of the article, and 0 otherwise. If a given journal is classified in more than one Web of Science category and has different positions concerning quartile, I have considered the best position according to this measure.

Collaboration is 0, 1, 2, 3, 4, 5 if the article has one author (no collaboration), a domestic collaboration (a collaboration between two scientists working at different institutions located in the same country), a collaboration with a modest innovator (EIS = 1), a collaboration with a moderate innovator (EIS = 2), a collaboration with a strong innovator (EIS = 3), and a collaboration with an innovation leader (EIS = 4), respectively.

As for the identification of articles with DRC, I disregarded all the articles that respect both conditions: (1) have collaborations between scientists working in different institutions located in the same country and (2) have foreign scientists. I have adopted this procedure aimed to avoid impurities in the estimated effect of DRC due to the presence of IRC. The articles with DRC were identified using the InCites platform. As for the articles with IRC and given that I wanted to control for the effect of the development of the RIS, I have only considered as IRC those articles that have scientists from only two countries and the collaborating country is one of the 36 mentioned above. I adopted this procedure because data for EIS is only available for a few countries.

Model specification

The dependent variable takes values in the unit interval [0 1], therefore, I have resorted to fractional regressions that have been used in many economic settings for example in modelling the employee participation rates in pension plans (Papke & Wooldridge, 1996). Ordinary least square (OLS) regression might seem attractive but a strong drawback of this methodology is that the predicted values can never lie in the interval [0 1] (Papke & Wooldridge, 1996). Thus, the log–likelihood function for fractional models is:

$$\mathrm{Ln} L=\sum_{\mathrm{j}=1}^{N}{w}_{j}{y}_{j}\mathrm{ln}\left\{{\mathrm{X}}_{j}^{^{\prime}}\beta \right\}+{w}_{j}\left(1-{y}_{j}\right)\mathrm{ln}\left\{1-G\left({\mathrm{X}}_{j}^{^{\prime}}\beta \right)\right\}$$

where N is the sample size, yj the dependent variable, wj the optional weights, and \(G\left(\bullet \right)\) is a probit model. The functional form for \(G\left({X}_{j}^{^{\prime}}\beta \right)\) is \(\Phi \left({X}_{j}^{^{\prime}}\beta \right)\) where Xj are the independent variables for publication j and \(\Phi\) the standard normal cumulative density function. For more details on the model see (Papke & Wooldridge, 1996; Wooldridge, 2010).

$$\Phi \left({\upbeta }_{0}+{\upbeta }_{1}*{EIS}_{\mathrm{j}}+{\upbeta }_{2}*{Pages}_{\mathrm{j}}+{\upbeta }_{3}*{References}_{\mathrm{j}}+{\upbeta }_{4}*{Authors}_{\mathrm{j}}+{\upbeta }_{5}*{Access}_{j}+{\upbeta }_{6}*{Quartile}_{j}+{\upbeta }_{7}*{EIS}_{\mathrm{j}}*{Collaboration}_{\mathrm{j}}\right)$$

In determining the model, I have used the command fracreg from Stata version 16.0 and robust standard errors.

To interpret the results from the regression model, I have resorted to predictive margins and marginal effects. Predictive margins are statistics calculated from a fitted model at fixed values of some independent variables and averaging or otherwise integrating over the remaining independent variables. Marginal effects are partial derivatives of the fitted model concerning each variable in the model for each unit in the dataset (StataCorp., 2019).

Results

Descriptives

In the dataset, the average number of pages, references and authors per article is 12.58, 41.60 and 4.74, respectively, and the medians represent lower values, indicating that the distributions are positively skewed (Table 1). As for the number of pages and references, there are articles with a high number of pages and references. I analysed the articles with a high number of pages and found no particular reason for the high number of pages, although in some cases the high number of references takes up several pages (for example, the article with 1476 references has 60 pages and 48 pages have only references). However, this is not the case for all articles with hundreds of pages (the article with 567 pages has 120 references). Finally, the hyperauthorship (Cronin, 2001) of the article with 1230 authors relates to the collaboration of the teams JET, EUROFusion MST1 and ASDEXUpgrade, i.e., collaboration on a macroscopic scale. The dataset is also very heterogeneous concerning the variable percentile (coefficient of variation of 62%). Besides the presence of these special observations in the dataset, they have not been removed because they are true outliers, i.e., they represent natural variations in the sample and did not result from measurement, data entry, or processing errors.

Table 1 Descriptive statistics for pages, references, authors, and percentile

More than 55% of the articles are non–OA and about 36% were published in journals that were in the first quartile in 2017 according to the JIF (Table 2). The distribution of articles according to the development of the RIS is not balanced and there are more articles of countries classified as strong innovators (Table 2). The same pattern is observed for the different types of collaboration being the articles with DRC the most common in the data (43%). The values for the percentile are very interesting as they show that the impact is greater for OA articles (on average, the percentile is 0.55) than for non-OA (Table 2). Articles in sources classified in the first quartile are cited more frequently than articles published in journals in the other quartiles (on average, the percentile is 0.65). As expected, countries with more developed RIS generate articles with greater impact (Table 2). As far as the type of collaboration is concerned, the behaviour in terms of scientific impact is mixed. For a detailed explanation of the expected behaviours, see Sect.  “Model”.

Table 2 The proportion of articles by levels of access, quartile, EIS and collaboration. For each variable is also presented the average percentile

Model

In Table 3, I present the results of the regression for three models: the model with the variables representing the articles’ features (model 1), the model with articles’ features and the level of development of the RIS of each country analysed (model 2), and the full model taking into account all the variables (model 3). Among all models, the Akaike's information criteria indicates that the full model better describes the relationship between the dependent and independent variables.

Table 3 Fractional regression model

The higher the number of pages, references and authors, the higher the scientific impact of articles, as shown by the positive and statistically significant coefficients for these variables. Furthermore, articles that are OA have a higher scientific impact than non-OA articles (positive and statistically significant coefficient). Articles published in journals in the first quartile have a higher scientific impact than articles published in journals in the other quartiles (positive and statistically significant coefficient). These results are observed in all the models.

Regarding the effect of EIS and collaboration, the results are more difficult to interpret. In general, we observe positive and statistically significant coefficients. Exceptions are (1) the interaction representing a collaboration between an innovation leader and a modest innovator, which has a negative and statistically significant coefficient, and (2) the interaction between a strong innovator and a modest innovator, which has a positive coefficient but is not statistically significant.

To compare and discuss the effects of each variable and the interaction term, I have resorted to the concept of predictive and marginal effects.

In Table 4, the gains in impact for a small increase in pages, references and authors are 0.00172, 0.00256 and 0.00488 respectively. We also see that the percentile (conditional mean) increases by 0.0155 when we move from a non-OA article to an OA article. For articles published in journals in the first quartile, the impact increases by 0.163 compared to articles published in journals in the other quartiles. The quartile has the highest effect on scientific impact. More detailed information about the gains for pages, references and authors can be obtained when we look at different values for these variables.

Table 4 Marginal effects for pages, references, author, access, quartile. For the dummies, the base level is 0

As the number of pages, references and authors per publication increases, the conditional mean of percentile also increases (Figs. 1, 2, 3, left hand), which we had already expected given the positive and statistically significant coefficient in Table 3. However, we get the most interesting results when we look at the marginal effects (right in Figs. 1, 2, 3). For pages, the effect on the scientific impact of articles with six pages is the same (increase in scientific impact by 0.00172) as the effect of articles with eleven pages. For references, the effect increases up to about 40–50 references per article (scientific impact increases by 0.00258 for articles with 10 references and by 0.00264 for articles with 40 references) and then starts to decrease (for an article with 100 references, the impact increases by 0.0024). The same pattern is observed for authors, the effect on impact increases up to six authors per article (by 0.004906 for an article with two authors and by 0.004911 for an article with six authors, although we have a slight increase) and then starts to decrease. For an article with 30 authors, the impact increases by 0.00470.

Fig. 1
figure 1

The conditional mean of percentile (left hand) and effects on conditional mean of percentile (right hand) of pages. The bars represent confidence intervals (CIs) at 95%. I chose to limit the number of pages to 21 because 88% of the articles have 21 or fewer pages

Fig. 2
figure 2

The conditional mean of percentile (left hand) and effects on conditional mean of percentile (right hand) of references. The bars represent CIs at 95%. I chose to limit the number of references to 100 because 97% of the articles have 100 or fewer references

Fig. 3
figure 3

The conditional mean of percentile (left hand) and effects on conditional mean of percentile (right hand) of authors. The bars represent CIs at 95%. I chose to limit the number of authors to 30 because 99.9% of the articles have30 or fewer authors

The effects of EIS and collaboration are easily understood by looking at Figs. 4 and 5. The higher the development of the RIS, the higher the scientific impact of articles without collaboration or articles with DRC; the scientific impact of single-authored articles of scientists from modest innovators is 0.338 and from innovation leaders is 0.511 (Fig. 4, left hand). In Fig. 4, we also see that collaborative activities do not always lead to articles with higher scientific impact than single-authored articles (H1 is partially confirmed). For scientists from strong innovators collaborating with scientists from modest innovators, we have already seen that the coefficient is not statistically significant, and in Fig. 4, we see the large overlap of the CI with that of articles without collaboration. For scientists from innovation leaders collaborating with scientists from modest innovators, the impact of the articles resulting from this collaboration is below (equal to 0.456) of the impact (equal to 0.511, Fig. 4, left hand) of the single-authored articles originated by the scientists in the innovation leaders (see also the effect below 0 in Fig. 4, right hand, and the statistically significant coefficient in Table 3). Similar results have been found in a previous study (Chinchilla-Rodriguez et al., 2019).

Fig. 4
figure 4

The conditional mean of percentile (left hand) and effects on conditional mean of percentile (right hand) of collaboration according to EIS. For the graph on the effects, the base category is collaboration equal to 0, i.e. no collaboration. The bars represent CIs 95%

Fig. 5
figure 5

The effects on conditional mean of percentile of collaboration according to EIS. For the graph on the effects, the base category is collaboration equal to 1, i.e. DRC. The bars represent CIs 95%

For scientists from country i (independently of its RIS), the effect of collaboration with scientists from innovation leaders on scientific impact is always positive compared to articles without collaboration, and the effect decreases as the EIS of country i increases from moderate innovator to innovation leader (increases of 0.207 and 0.063, respectively, Fig. 4, right hand).

In short, for two scientists from two different countries (i and j) collaborating with scientists from the same country, saying country A, the scientists working at the country (between i and j) with the more developed RIS benefits less from IRC. An exception are scientists from modest innovators and moderate innovators; for the latter, the gain from working with scientists from strong innovators or innovation leaders is always higher than that of collaborating with scientists from modest innovators. Also, for scientists from modest innovators, the collaboration with scientists from a country with a RIS in the same category leads to articles with the highest impact (0.501 Fig. 4, left hand).

For scientists from moderate innovators, strong innovators and innovation leaders collaborations with scientists from modest innovators lead to articles with a lower scientific impact than the collaboration of type 1, i.e. DRC, and the effect tends to be higher (more negative) as the EIS increases (e.g., for scientists from innovation leaders collaborating with scientists from modest innovators, the scientific impact decreases by 0.096 in relation to the articles with DRC, Fig. 5). Also, a collaboration between scientists from innovation leaders and scientists from moderate innovators leads to articles with a lower impact than those representing a DRC published by scientists from innovation leaders (0.539 and 0.553, respectively, Fig. 4). For scientists from innovations leaders, but this time for a collaboration with scientists from strong innovators, the impact is slightly higher than that of their articles with DRC, but the effect is not statistically significant (Fig. 5). Finally, the impact of articles generated in a collaboration between scientists from strong innovators and scientists from moderate innovators increases by 0.0195 compared to the articles involving DRC of the scientists from strong innovators (Fig. 5). Therefore, H2 is partially verified.

For scientists that collaborate with others from countries with more developed RIS, the scientific impact of the articles resulting from this collaboration is higher than the impact of their articles involving DRC (H3 is true, Fig. 4, left hand). The magnitude of this effect tends to decrease as we move from a collaboration with scientists from modest innovators to collaboration with scientists from strong innovators (Fig. 5). For scientists from modest innovators that collaborates with scientists from innovation leaders, the impact of the articles increases by 0.0602 relative to their articles with DRC, while for the articles of scientists from strong innovators working with those from innovation leaders, the impact increases by 0.0498 (Fig. 5).

Finally, and although I have not formulated a hypothesis regarding the expected behaviour when scientists collaborate with others from a country that has EIS in the same category, the results show that the scientific impact is always higher than that of the articles with DRC; for example, for scientists from moderate innovators collaborating with those from countries in the same category, the impact is 0.535 and for their articles with DRC it is 0.504 (Fig. 4).

Conclusions

The benefits of RC have been widely explored by the scientific community. Of the several benefits, the effects on the scientific impact, i.e., the citations that the scientific knowledge has attracted, have received great attention. However, the extant literature has not considered that the effect of the different types of RC on the scientific impact depends on the development of the RIS of each country. I have explored the issue for the articles of 36 countries, the countries belonging to the EU–27 as well as the UK, Norway, Switzerland, Serbia, Turkey, Iceland, North Macedonia, Ukraine and Israel, and taking into account that other variables (number of pages, references and authors, the accessibility of the article and the quartile to which the journal in which a given article was published belongs to) also have influence. The findings of this study lead to a number of implications for both theory and practise, although they must be interpreted in light of the limitations associated with this study, which are discussed in the Limitations and Future Research section.

Contrary to much of the literature, which states that RC originates publications with higher scientific impact than non–collaborative research, the findings in this study clearly show that the impact depends on the development of the RIS of the collaborating countries. RC between scientists from innovation leaders and those from modest innovators results in articles with lower impact than the single–authored articles of the scientists from innovation leaders. Others have also shown that for some countries with high expenditure on R&D, DRC leads to lower scientific impact than single- authored papers (Fig. 3 e.g., countries with expenditure on R&D equal or above 2% of the GDP, Chinchilla-Rodriguez et al., 2019).

It has been argued that IRC generates publications of a higher impact than DRC. In the same line as the previous comment, the results show that the development of the RIS of the collaborating countries should be taken into account. For example, a collaboration between scientists from moderate innovators, (or strong innovators or innovation leaders) and scientists from modest innovators originates articles with lower impact than the articles with DRC of scientists from moderate innovators (the strong innovators or innovation leaders). A collaboration between scientists from modest innovators (or moderate innovators or strong innovators) and scientists from innovation leaders generally leads to articles with higher impact than the articles with DRC of the scientists from modest innovators (moderate innovators or strong innovators).

For scientists from two countries (i and j) collaborating with scientists from the same country (the development of RIS is higher than that of countries i and j), the scientists from the country (between i and j) with the more developed RIS benefit less from IRC when we compare the increase in impact relatively to their single-authored articles or articles with DRC in general. For example, the gains (marginal effects in scientific terminology) are higher for articles written by scientists from modest innovators in collaboration with scientists from innovation leaders than for articles written by scientists from strong innovators in collaboration with scientists from innovation leaders. The exception is scientists from modest innovators and moderate innovators, because for scientists from moderate innovators, collaboration with scientists from countries with a more developed RIS always leads to higher gains in the scientific impact of their articles than for articles written by scientists from modest innovators when the baseline is single-authored articles. An interesting case for further study.

As for the other variables, it has been stated that the higher the number of pages, references and authors per article, the higher the scientific impact. The findings from previous studies are in concordance with this study. However, we can now draw broader conclusions as the study shows that the gain in impact is not necessarily the same for a small change (increase) in these variables. As for the number of pages, the gains are more or less the same when increasing the number of pages per article. In the case of references, the gains increase as we go from an article with 10 references to another with 40 references, and then, start to decrease. Concerning the number of authors, the gains on impact increase up to six authors per article and decreases for higher values.

Regarding the variables access and quartile, I have found that OA articles and articles published in journals in the first quartile have a higher scientific impact than those that are non-OA or published in journals in other quartiles. Finally, among pages, references, authors, access and quartile, quartile has the highest effect on scientific impact. A comparison of these effects with those of the EIS and collaboration is not straightforward, as the patterns observed are multiple. Each scenario must be evaluated individually.

These findings are relevant for scientists and policymakers who design and implement policies to promote RC. We would say that in a country with a well–developed RIS, policies that encourage collaboration with countries with less–developed RIS should be avoided. However, this is not possible as the goals of RC are not only to increase scientific impact. In many cases, these policies aim to address societal issues and challenges with research, maintain good and stable diplomatic relations, and promote higher education institutions abroad. Therefore, the strategies to achieve these goals may require collaboration with scientists in less-developed RIS. Thus, the strategies should aim at a balance, i.e. if the policies related to the above goals are in place with less–developed RIS, then others should compensate for the less positive effect of this type of collaboration on scientific impact of a country with a well–developed RIS.

The results for access could lead to policies that reward scientists who publish open research. Indeed open science is increasingly embedded in policies and expected in practice for a variety of reasons (transparency, accountability, equity and collaboration in knowledge production by increasing access to research results, Cole et al., 2022). However, we have to look at the findings of studies suggesting that the participation in article processing charges (APCs) of OA is skewed toward scientists with greater access to resources (scientists in high–income countries) and job security (Olejniczak & Wilson, 2020), and that scientists in the Global South frequently cite APCs as a financial barrier (Smith et al., 2022).

Limitations and future research

Regarding the limitations of this study, we have those related to the data and the variables.

As for the data, the reader should bear in mind that I have analysed a number of specific countries (EU-27 plus the UK, Norway, Switzerland, Serbia, Turkey, Iceland, North Macedonia, Ukraine and Israel), as the Summary Innovation Index is only available for these countries. Therefore, it is not possible to generalise the results. In the data, I have retained those articles that represent hyperauthorship or collaboration at a macroscopic level, although I am aware that these are special cases and future studies on this topic should question whether these articles are used. The number of observations in the different categories of the variables EIS and collaboration (Table 1, supplementary material) varies considerably among the many possible combinations of these variables. However, a more balanced distribution is not easy to achieve. Each observation refers only to bilateral collaborations within the set of countries considered (36 countries) which in turn limits the scope of the conclusions drawn in this study. For publications with multilateral collaboration (more than two countries), I find it difficult to get a clear picture of the influence of the type of collaborating country by level of development of the RIS. On the other hand, the number of papers involving scientists from more than two countries was not taken into account, and these could represent a considerable number of publications.

As for the variables, it should be noted that some variables (persuasion, novelty and authors' position in the research network) that could influence the citation impact were not considered in this study for several reasons: they are intangible and unmeasurable (persuasion), difficult to measure given the underlying concept (novelty), and difficult to determine given the need to access large amounts of data and the complexity of the algorithms to be developed (novelty and authors' position in the research network).

As for further research, it would be very interesting to expand the frontiers of knowledge on this subject through studies that analyse bilateral collaborations between countries other than European ones. However, I realise that in order to do this, we need to develop an indicator to characterise the level of development of the RIS that can be applied to all countries, and this is a very challenging task. In this way, it would be possible to determine the extent to which the results of this study can be universal.

The results have shown that the more similar the level of development of the RIS of the two collaborating countries, the smaller the gain in impact of the articles. The exception are the articles from scientists from countries that are modest innovators. For them, the gain in impact is lower than that of the articles of the scientists from the countries considered moderate innovators. An interesting research question could be: Is there a maximum distance between the levels of development of RIS of two collaborating countries beyond which the gains start to diminish?