Introduction

With the exception of rare instances of fundamental, paradigmatic shift (Kuhn, 1966), many aspects of scientific inquiry constitute iterative processes in which new insights are obtained through the recombination and synthesis of prior knowledge. Only in the wake of such recombination is it possible for new hypotheses to be developed and subsequently to be tested to obtain new insights and derive new theories (Brody, 1993; Elliot, 2012). By grounding their research in a (re)combination of prior knowledge, scientists add to our existing knowledge base in a process which is not only cumulative but also emergent, i.e. one in which results transcend the substance of their constituent parts (Elliot, 2012, Šešelja et al., 2020). At the same time, science is a social process of mutual perception, affirmation, and debate in which new opportunities to create knowledge can only arise if its carriers are aware of each other and consciously choose to interact as knowledge agents. Consequently, the scientific system depends on both formal and informal communication during the discovery process as well as for the subsequent communication of discoveries (Šešelja et al., 2020; Zollman, 2013). As one major channel of communication within the science system, academic publications document recent developments and build the foundation of the further expansion of the knowledge system.

The scientific creation of knowledge thus requires interpersonal exchanges for three main reasons. First, to gain access to previous findings, including their implicit dimensions, by maintaining contact with relevant colleagues (Šešelja et al., 2020). Second, to form teams with such colleagues in order to effectively recombine existing knowledge from different sources and to produce novel insights (Zollman, 2017). Third, to communicate their own findings through diverse channels so that they can serve as a basis for subsequent studies—thus fuelling the recursive process of science (Loroño-Leturiondo & Davies, 2018; Entradas, 2022).

Against this background, much of the conceptual literature has posited that scientists with better networks should—all else being equal—be more productive and better placed to create high-quality, high-impact research (Šešelja et al., 2020). Although some simulation studies initially suggested that overt connectivity may have its drawbacks (Zollman, 2007, 2010, 2013), these findings have since been qualified by more detailed simulation approaches. The more recent models account for the positive effects of serendipity and for the positive effect of random discoveries which result from meetings without a predefined purpose (Frey & Šešelja, 2020; Kummerfeld & Zollman, 2016). They now unambiguously support the notion that connectivity is beneficial. In addition to such simulations, several empirical studies have corroborated the benefits of connectivity, primarily on the basis of co-publication activities (Angere & Erik, 2017). Further research has drawn on information related to funding acknowledgements and to peer interactive communication (sub-authorship), often arriving at similar conclusions (Álvarez-Bornstein & Bordons, 2021; Álvarez & Caregnato, 2021; Álvarez-Bornstein & Montesi, 2020; Díaz-Faes & Bordons, 2017; Costas & van Leeuwen, 2012).

Despite their breadth, however, these studies can only account for a subset of relevant interactions across the gamut of scientific interactions. Among the large array of possible communication channels which enable both the exploration of epistemic landscapes (Grim, 2009; Grim et al., 2013) and debates about opposing scientific positions (Borg et al., 2019), any analysis based on what is formally documented in a publication’s text captures only a fraction of interpersonal exchanges, however broad the interpretation. Co-publications, in particular, merely document the ultimate manifestation of collaborations towards or after the end of joint research activities. This neglects earlier, less formal types of personal exchanges, which gave rise to the joint research effort in the first place.

In fact, it is quite common for a good share of relevant exchanges of ideas to never reach the stage of a joint, fully peer-reviewed journal publication at all. Among other reasons, funding may not materialise for all parties, initial members of a discussion circle may assume new and time-consuming positions elsewhere or simply lose interest, or the joint manuscript may get caught up in an extended review process and never get published. In all three cases, a substantial exchange of knowledge has taken place nevertheless, often to the extent that initial findings from joint activities have been made available for discussion. Quite evidently, effective intellectual cross-fertilisation between scientists starts long before it is ever formally documented in a co-publication.

Against this background, this paper seeks to broaden our conceptual understanding of interactions in the scientific domain. Currently, most studies continue to focus on co-publications and other publication-related indicators such as citations, acknowledgements, or funding references. Certainly, the latter have substantially widened our perspective with regard to additional types of relationships which contribute to the scientific process (Álvarez-Bornstein & Montesi, 2020). Although they are a direct reference to what is documented in publications, they can as also be considered proxies for the underlying processes which they indirectly reflect (Díaz-Faes & Bordons, 2017). Nonetheless, they have not truly extended our analytical reach to include aspects of informal exchanges between scientists which precede the publication process.

Even though the importance of these complementary, informal activities has long been acknowledged, both conceptually and in qualitative case studies (Viglione, 2020; Torre & Rallet, 2005), very few quantitative studies have focused on the informal, personal aspect of interactions between scientists. That relatively little attention has been paid to this aspect in overarching, indicator-based studies so far is mostly due to a lack of suitable measurements. While the abovementioned indicators yield important additional insights, their interpretation rests on attributions by authors rather than on the analyst’s own conceptual framing.

In this paper, we develop and test an alternative approach to capture informal interactions in the scientific domain. This is based on the conceptual juxtaposition of two different types of proximity: proximity in co-publication networks ad actual and spatial proximity at conferences.

A baseline which should be considered here is an in-depth analysis of co-publication data. The core question is which general position a particular researcher assumes in the domain of documented collaboration, as documented by specific network measurements. In the scientific domain, where access to knowledge is a key criterion of success, a central network position equals not only access per se, but also power. Hence, a central position in co-publication networks can be interpreted as a proxy for influence.

In light of this, our analysis proposes a way to infer the extent of informal exchanges based on researchers’ joint presence at conferences, as documented through submitted conference proceedings. As conference proceedings are rarely rigidly peer reviewed, they can be viewed as evidence of early results from ongoing work and thus, by and large, as documentation of a researcher seeking exchanges on work in progress. Beyond the presentation of research findings, conferences represent one, if not the most important, space in which scientists cultivate intellectual exchanges, make new acquaintances, forge new alliances, and lay the groundwork for new collaborations (Viglione, 2020; Torre & Rallet, 2005). In summary, joint presence at conferences opens up a window of insight into those early, more informal interactions which remain impossible to trace in detail.

This paper provides a threefold contribution. First, it proposes and validates additional ways in which the analysis of co-publication networks can be leveraged to expand existing perspectives beyond the fundamental concept of access to knowledge. Second, it proposes and validates a novel methodology to account for co-presence at conferences as a complement to such analysis. Third, it explores which standard indicators of scientific performance (as reflected in the visibility of publications) are affected by these two aspects. It concludes by discussing these insights in the context of the existing literature.

Conceptual section

For a number of years, different studies have confirmed that the academic performance of scholars is in one way or the other dependent on their degree of networkedness (Abbasi et al., 2011, 2012; Guan et al., 2015). More importantly, networkedness is a highly person-specific characteristic in itself, which cannot be derived from personal characteristics like scientific age (Badar et al., 2014) or organisational affiliation (Cugmas et al., 2020).

An increasing body of academic work has come to investigate this phenomenon in a more differentiated manner, including various aspects of networkedness, such as degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality (Bordons et al., 2015; Tajedini et al., 2019; Xu & Chang, 2020). Moreover, some studies included additional attributes of the nodes to which the individual in question is connected, such as their location (cf. e.g. Persson, 2010; Nomaler et al., 2013; Schmoch & Schubert, 2008 on national versus international cooperation), as well as the question of whether cooperation partners are highly cited or not (Fassina Santini et al., 2021; Hâncean et al., 2021). Most commonly, the study’s conceptual approach focused on access to knowledge (number and accessibility of collaborators) or on academic capacity (citation rates of collaborators). Similarly, betweenness centrality was most often interpreted as the capacity to broker knowledge, but not necessarily as the capacity to broker personal relations. As a possible consequence of this dominant narrative, network measures pertinent to aspects of influence and control such as cores, cliques, or islands are less frequently considered. There are a few exceptions like the consideration of eigenvector centrality by e.g. Xu and Chang (2020), blockmodeling by Cugmas et al. (2020), or the consideration of structural holes by Guan et al. (2015).

However, the results of existing studies remain in part inconclusive when considering more complex measures like betweenness centrality or closeness centrality. With a view to those, findings depend on the specific research domain or country sample under consideration (e.g. Abbasi et al., 2011 vs. Abbasi et al., 2012 vs. Xu & Chang, 2020 vs. Ortega, 2014 vs. Tajedini et al., 2019). So far, moreover, most of these samples have been rather limited in size, so that it seems difficult to infer any more generalisable findings to start with.

At the same time, a parallel strand of literature has developed an ever clearer understanding of the role of temporary proximity in diverse domains of science and the economy (Balland et al., 2015; Maskell et al., 2006; Rallet & Torre, 1999; Torre, 2008; Urry, 2002). While origins of this debate can be traced back more than four decades (Hägerstrand, 1970), it has only recently resurfaced with greater momentum, following the general rise of proximity related considerations in innovation studies since the mid-2000s (Gertler, 2008; Growe, 2018; Henn & Bathelt, 2015; Robertsson & Marjavaara, 2014; Torre, 2015). In principle, its core argument is that proximity gives rise to serendipity-based generation of ideas, allows for implicit methodological knowledge to be conveyed, and creates opportunity for the codification of latent conceptual ideas through mutual in-person discussions among experts. In turn, these create the foundations for subsequent scientific activities, results, and performance. Even if these processes may be less obvious in science than in engineering, the mere fact that scientific conferences have resisted all attempts to virtualise them (Viglione, 2020) suggests that many researchers consider co-presence relevant for their professional development (rather than for mere touristic purposes).

Conceptually, this line of thought draws on an agency-based understanding of the practice of science and innovation, in which new ideas result not only from access to codified information, but grow most readily from contact and exchange with other individuals who share a similar cognitive framework (Asheim & Coenen, 2005; Asheim et al., 2007; Grillitsch & Sotarauta, 2019). Complementing the more general “communities of practice” argument, it suggests that to effectively develop joint ideas even cognitively close individuals will have to meet on a personal basis in the same physical space (Asheim et al., 2007; Bathelt et al., 2004; Gertler, 2008; Growe, 2018; Henn & Bathelt, 2015; Li, 2014; Rallet & Torre, 1999; Torre, 2015). At the very least, there is compelling empirical evidence that, for decades, meeting in this way has served to facilitate knowledge production and the set-up of new, creative collaborations which are necessary to fuel the effective generation of new knowledge (Belso-Martinez, 2012; Gertler & Levitte, 2005; Grabher, 2002; Robertsson & Marjavaara, 2014; Rychen & Zimmermann, 2008).

In more concrete terms, presence at academic conferences is—or at least used to be—considered essential to remain at the forefront of scientific developments and to liaise with, interact with, and co-opt appropriate colleagues for future joint undertakings (Kim, 2010, 2017; Shelley-Egan, 2020). Without these initial, casual interactions, few scientists would ever reach the awareness or versatility of thought which underpins high-quality conceptual contributions and ensures they remain both topical and pertinent to current debates (Oester et al., 2017; Parker & Weik, 2014; Storme et al., 2017). In line with this, Gorodnichenko et al. (2021) demonstrate that presentations at conferences influence the likelihood of subsequent publication while Leon and McQuillin (2018) demonstrate the same for subsequent citation. In parallel, similar findings have been made for business fairs (Henn & Bathelt, 2015; Li, 2014) or for co-presence at workplaces (Gertler, 2008; Growe, 2018; Lassen, 2009), suggesting a general applicability of the underlying principle. That said, some studies have raised caveats concerning the negative impact of constant conference mobility, due to, inter alia, long absences and physical and psychological stress factors (Cohen & Gössling, 2015; Høyer & Næss, 2001). Nonetheless, current debate suggests that physical meetings at conferences will remain essential once the pandemic has subsided, even if overall conference travel will become more limited than before (Viglione, 2020).

As in the case of formal collaboration, previous research suggests that the intellectual productivity of informal exchanges increases—at least up to a certain degree—with their diversity (Fox, 2005; Fursov et al., 2016). Accordingly, the relevance of informal contacts should not only depend on the sheer number of conference visits that a researcher has undertaken, but also on the number of other researchers who he encountered during these visits, and on his or her degree of centrality in this network of co-presence. In practice, this is likely to reflect whether one meets frequently with the ‘usual suspects’ at the same type of conferences, or whether one explores novel research domains—be this through active effort to widen one’s field of vision or as a function of increasing seniority and broader responsibilities.

With regard to the dimensions of academic performance which either type of networkedness might help to explain, many earlier studies have taken leant on citation-based measures like the g-index (Abbasi et al., 2011; Bordons et al., 2015; Xu & Chang, 2020). In the extant literature, hypotheses focus mostly on visibility- or citation-orientated dimensions of performance instead of on pure output or scientific productivity. Beyond this commonality, the choice of concrete dependent variables remains somewhat idiosyncratic, derived from the specific study context. Arguably, the prevalent use of the g-index can be considered a choice of convenience as it is a benchmark. In our view, the conceptual consideration of which specific effects different aspects of networkedness may trigger has remained somewhat underdeveloped.

In this study, we therefore deploy a threefold perspective on core visibility-related subdimensions of scientific performance which we consider central from a conceptual perspective:

  • Citation per paper: to reflect the achieved de-facto recognition of publications which is commonly held to relate to their scientific quality or at least pertinence,

  • Crown indicator: to reflect the achieved de-facto recognition of publications which controls for discipline-specific particularities

  • Average journal impact factor: to reflect the potential visibility of a publication which results from the fact that it has been placed on a visible platform.


Against this background, the question remains how these different elements of academic connectedness and proximity relate. So far, no comprehensive model has been put forward in the literature; arguably because most of the empirical means available remained rather limited and partial. Yet, a number of key propositions can be considered established or at least suitable to serve as strong hypotheses.

First, a generic track record of past collaboration with other scientists—the overall number of contacts to other distinct scientists—will prove beneficial to both the level and also the quality of a researcher’s current scientific output. So far, this assumption has been fairly unanimously confirmed in the existing literature (Abbasi et al., 2011, 2012).

Second, the degree of influence which a scientist holds within their community—as documented by their central role in key clusters within past collaboration networks—will prove beneficial for both the level and the quality of a researcher’s current scientific output. While this proposition has been less widely explored, some earlier findings on the role of eigenvector centrality suggest it deserves further investigation (Xu & Chang, 2020).

Third, the role of brokering functions (betweenness centrality) or general accessibility (closeness centrality) deserves further investigation, as the results of prior research have been empirically somewhat inconclusive, while conceptually suggesting that these roles may indeed be of relevance (Bordons et al., 2015; Tajedini et al., 2019; Xu & Chang, 2020).

Fourth, scientists’ general propensity to attend and present at conferences can be expected to influence their ability to participate in processes of intellectual exchange and hence their ability to make high-quality, pertinent, and topical contributions. While there is no prior evidence for this proposition, it follows conceptually from the literature on the role of temporary co-presence (Henn & Bathelt, 2015; Torre, 2015) and respective simulations (Frey & Šešelja, 2020; Kummerfeld & Zollman, 2016).

Fifth, the sheer number of visits to conferences may not influence scientists’ propensity to make high-quality contributions directly. Instead, it should be investigated whether the role an individual plays during such meetings is central here—and the way in which such regular meetings maximise opportunities for serendipity and cross-fertilisation of ideas. Such diversity of input may well not depend on mere frequency of visits alone (Frey & Šešelja, 2020; Kummerfeld & Zollman, 2016).

Finally, a number of complementary factors can be assumed to superimpose and obscure the effects of networkedness and should therefore be included as control variables. These include scientific age, gender, and organisational environment (Fursov et al., 2016; Carayol & Matt, 2006).

In summary, this paper aims to corroborate and link some key assumptions about the central preconditions of scientific performance, which, in the extant conceptual literature, have been discussed under different headings: access to codified knowledge (co-publication) and encounters with colleagues (co-presence). Both are essential to transfer implicit knowledge, explicate latent ideas, and—on the basis of serendipity—give rise to new ones.

The authors’ ambition is to identify general principles which apply across disciplines and communities. We are fully conscious that the particular characteristics of different disciplines and localities remain to be identified. Thus, our objective here is only to address the fundamental concepts of the debate and to avoid such particularities.

Importantly, it is not our goal to explain all origins of scientific performance comprehensively and exhaustively. The definition of academic success will always be personal and idiosyncratic. A comprehensive analysis would require the inclusion of many additional indicators and lines of argument. Hence, it is not the ambition of this paper to determine the relative role of network embeddedness and co-presence vis-a-vis other factors. Instead, it will, in the first instance, confirm that both are significant predictors of scientific performance and, in the second instance, compare their predictive power vis-a-vis each other.

One particularity we have chosen to include is a nationally specific perspective. Both practically and empirically, it would be next to impossible to address the global republic of science altogether. Moreover, the universe of publication activities is dominated by some leading countries to such an extent that an attempt to cover all countries would effectively result in us covering a mix of the U.S., the U.K. and China. Instead, we have chosen to focus on the entirety of German publications from 2010 to 2018, i.e. publications co-authored by at least one person with a main academic affiliation in Germany. Thus, we focus our analysis on the academic system which we conceptually know best and feel best placed to interpret in a qualified manner.

Methods and data

Data generation and dataset

To conduct our analysis, we primarily used Elsevier’s Scopus database. Scopus is a bibliometric database which includes publications in more than 22,000 international journals as well as a large number of conference papers. Among other possible sources, we chose Elsevier Scopus as its coverage of disciplines is very broad. In addition, “non-core” literature, such as conference proceedings or less relevant journals—in terms of citation rates—are sufficiently covered. It also displays a rather balanced global coverage without overt bias (Michels & Schmoch, 2012).

The sample of publications considered for analysis includes all publications listed in Scopus which include at least one author with a German affiliation during the period from 2010 to 2018. This period was chosen to cover a substantial stretch of time in which personal attributes can be interpreted as characteristics rather than as idiosyncrasies, and in which citations have time to materialise. With regard to the former aspect, eight to ten years are commonly considered the minimum. With regard to the latter aspect, the dataset has to end in 2018 as citation figures and all derived measures were not yet complete for subsequent years at the time of writing. Across all disciplines, a total of 720,377 distinct authors and 624,806 distinct authors who had published an article, review, letter, or note (i.e. a publication other than a conference proceeding) could be identified. To this dataset, we added further variables from Scopus as well as other sources, which are described in more detail in the following subsections.

The response variables

As a bibliometric database, Scopus includes all information listed on a publication in addition to citation information, i.e. the number of references listed on a publication, as well as the number of citations received by subsequent publications. Based on this information, we are able to calculate three indicators which will serve as dependent variables for our models. These are:

  1. (1)

    The average number of received citations per publication (over a 3-year time window),

  2. (2)

    The field-normalised average citation count, also known as the crown indicator (e.g. Waltman et al., 2011), and

  3. (3)

    The average journal impact factor (JIF) of journals in which an author’s articles are published, e.g. if an author has published in three different journals, the average JIF of these three journals was calculated.

Independent variables

Network-based measures for co-publications and co-occurrences at scientific conferences

With a eye to the main hypotheses of this paper—pertaining to researchers’ embeddedness in a network of co-publications and their embeddedness in a social network constituted by regular meetings—the following two network datasets were generated as a basis to calculate different network measures which serve as independent variables in our models.

First, we generated a co-publication network on the basis of co-publications of the type “article”, “review”, “letter”, or “note”, with the authors serving as the nodes and their respective co-publications as the edges of the network. With respect to those authors who realised at least one co-publication, we identified 576,189 single authors, 6,902,090 constellations of partners, and a total of 91,660,176 individual co-publication links between authors, including multiple co-publications induced by a single paper (network densityFootnote 1 0.00004158). From there, we limited our sample to those authors who were connected by at least five co-publications, which reduced the sample size to 117,767 single authors. Within this sample, 793,726 different constellations of partners and a total of 83,459,790 individual co-publications could be identified (network density 0.00011446).

Second, we generated another network, reflecting presumed meetings at conferences, which may arguably constitute a further relevant, thus far unexamined aspect of scientific collaboration. Therefore, we collected 265,771 conference proceedings from 2010 to 2018 from Scopus which were published by 186,327 distinct German authors, of which 84,596 could be identified as first authors of 203,024 conference proceedings (the latter number being lower as first authors of papers with German participation can be non-German).

Our approach aims to gather information on an author’s documented co-presence at conferences and, on that proxy basis, derive a network of potential personal encounters or at least joint temporary embedding in a specific space of professional discourse. To source the data, we analysed conference proceedings in Scopus by extracting the information of whether an author had joined a specific conference in a given year. Since Scopus only covers a limited amount of conferences, we enriched the dataset with information on conference proceedings from Microsoft Academic (MAG). Microsoft Academic was an open-source database for academic publications hosted by Microsoft Research, containing information on publications, authors, and their affiliations for articles as well as for conference proceedings. The MAG data was matched with the main dataset based on author names, affiliations, and field of study, employing a Levenshtein distance based on both name and affiliation. For the field of study, a manual concordance at a coarse-grained level was developed to make sure that research fields of the authors were similar across databases to avoid homonym issues. To avoid potential issues with researchers publishing in different fields of science, we established one main field of science for each author over his or her career which would qualify him or her as a match. Detailed cross-checks revealed that on average about 90% of any author’s publications during the period of observation were at least partially classified into their defined “‘main field”. Put differently, false negatives in matching are rather unlikely as each publication is assigned to on average 2.96 fields of science, so that the continued mention of main fields remains likely, even if authors shift their focus over time. Furthermore, manual checks confirmed that radical shifts in orientation towards an entirely unrelated field are rare—and might in any case justify the exclusion of such individuals as outliers in substance.

Both data sources, Scopus and MAG, display a notable bias on the engineering and the natural science domain (in which conference proceedings are more prevalent, cf. Table 1), but not to such an extent as to cast doubt on their use as a valid proxy altogether. Nevertheless, this limitation has to be kept in mind for the interpretation and generalisability of the results.

Table 1 Number and share of conference proceedings by discipline (2010–2018)

In several of the examined academic disciplines, e.g. in physics, it is customary to give credit to entire research teams. Our approach therefore assumes (potential) co-presence for the first authors only, instead of for the whole team. Quite evidently, assuming that all authors were present would lead to an indicator based on a wrong premise. In search of a better approach, we observed that first authors would consistently be the ones who travel to present new findings. Even where first authorship is a reflection of hierarchy or seniority rather than an actual presenter’s role, it is likely that the professor involved in the project would attend the conference alongside his or her PhD-candidate, who presents the research. In this case, the assumption that the first author is present at the conference would still hold. Although in some cases, this secondary assumption may also fail, a non-representative review of conference documentation suggests that these cases represent a minority. In any event, the presumed co-presence at the personal level would still be replaced by a co-presence at the team level in these instances, so that the result would be fuzziness instead of a genuinely false indication. Against this technical background, we decided to focus our conference proceedings-based co-presence analysis on first authors alone.

In the network of co-presence developed on this basis, we documented 2,568,487 potential encounters of first authors on a total of 2,861,248 occasions, most of which are singular incidences, implying that most authors only meet once at a conference. If single encounters are excluded, the number drops to 233,385 encounters on a total of 526,146 occasions. Only 1,713 pairs of first authors met more than five times during the period from 2010 to 2018, on a total of 11,981 occasions. Notably, this group of regular co-attendees at conferences is rather small. Still, the degree of concentration is lower than for the publication network (9% compared to 32% with only one connection), and consequently the network is denser on the whole (network density of 0.00075618 vs. 0.00011446) (see Table 2).

Table 2 Distribution of Joint Conference visits

To derive suitable indicators for subsequent analysis, the two network matrices, comprising the one for articles (including letters, reviews, and notes) and the other for conference proceedings, were imported into the Pajek software package. We then calculated a number of corresponding network measures for each of them. The indicators were calculated separately, once for the co-publication and once for the conference proceedings. The network measures were then exported into delimited files and matched to the main dataset of the abovementioned 720,377 single authors as attributes (1:1 matching), so that all information on the network indicators was present in the main dataset at the author level. For each author we derived the following variables from the network analysis for co-publications and proceedings:

  • Degree centrality: the total number of other nodes (authors) with which a node is connected, which serves as a basic indicator of connectivity,

  • Closeness centrality: the average length of the shortest path between the node and all other nodes, indicating positioning/accessibility of an author in the network,

  • Betweenness centrality: a measure which quantifies the number of times a node acts as a bridge along the shortest path between two other nodes, serving as an indication of a brokering function in the network,

  • Eigenvector centrality: a measure which is calculated based on the notion that connections to highly-networked nodes contribute more to the prestige of a given node than equal connections to less networked nodes, so that higher values imply a larger influence or prestige of an author in the network, and

  • Islands: a cluster of vertices of a given network with weighted vertices where the weights (“heights”) of the vertices on the island are larger than the weights of the vertices in the neighbourhood. In our analysis, the minimum size of islands is set to one, their maximum size to ten. The islands indicator serves to provide information on the positioning of authors in subareas of particular density (e.g. closed co-publication circles).


In addition to the network-based indicators we also included information on the number of conferences visited by each author based on MAG data, independent of whether such visits resulted in the publication of a conference proceeding listed in Elsevier Scopus. We added this information into our models as a further independent variable.

General control variables

We selected a number of control variables from Scopus. These reflect aspects and personal characteristics which, according to earlier studies, can be assumed to influence an author’s general propensity for publication, and should therefore be controlled for. They include scientific age (time since first publication entry in the Scopus database), with the assumption that scientists who have been part of the system for longer on average reach a higher number of citations, as well as gender (coded 1 for females and 0 for males), with the assumption that females are disadvantaged in the science system, reaching a lower number of citations on average. In addition, we added a set of dummy variables controlling for affiliation type (private firm/ public research institution/ university) and field of science (Scopus ASJC classification at the 2-digit level). Finally, we added the number of publications and the number of conference proceeding publications as control variables to our models in order to control for an author’s total publication output.

This leaves us with a final dataset of 576,189 German authors, including the dependent variables and control variables calculated from Scopus as well as the independent variables derived from the network analyses based on Scopus and MAG data.

The models

Our aim was to test whether scientific networks, measured by co-publications and co-presence at conferences, are related to academic performance as indicated by citation measures. Therefore, we ran a series of ordinary least square (OLS) regression models, using the network measures for co-publications and conference proceedings as well as the number of conference visits as our main independent variables. In addition, we added control variables. In order to be able to compare the size of the coefficients within the respective models, all independent variables and controls were z-standardised.

The following measures of academic performance were used as response variables in our models: (1) the average number of citations per publication, (2) the field-normalised citation count, commonly called crown indicator (Waltman et al., 2011), at author level in order to eliminate field-specific citation effects, and (3) the average impact factor of journals in which an author’s articles are published. As the first two indicators were non-normally distributed, they were subjected to a logarithmic transformation after which the criteria for normal distribution were sufficiently met on OLS estimator. The distribution of average journal impact factor at author level sufficiently met criteria for normal distribution. As all of the variables are metric—or sufficiently quasi-metric—the analysis was conducted by means of standard OLS regressions.

Overall, we estimated three main models for our three major dependent variables: M1 for the average number of citations per publication, M2 for the field-normalised citation rate (CI), and M3 for the average impact factor of journals. Each model is presented in three versions: (M.1) for variables relating to the co-publication network only, (M.2) for variables relating only to the presumed co-presence at conferences, and (M.3) for variables which combine both aspects.

For all the models, variance inflation factor (VIF) tests for multi-collinearity were conducted. The VIF tests showed no substantial multi-collinearity between the different network measures—as a rule of thumb, VIFs > 5 indicate a large degree of multicollinearity. A similar effect can also can be observed in the bi-variate correlation analysis (provided in the annex to this paper).

As the sample was very large—close to the entire known population—the coefficients of the models were highly significant in most cases, unless any relation was completely absent. Our subsequent interpretations will therefore primarily focus on the direction and the order of magnitude of any effects, instead of on their (quasi omnipresent) significances. At the same time, we chose to hold these significances to a much higher standard than usual, i.e. effects were already considered non-significant above a threshold of p > 0.001.

Results

When analysing the effects of traditional, co-publication-based embeddedness on the abovementioned measures only (Model 1.1, Table 3), we find that not only the simple extent of connectedness (degree centrality) but, even more so, the degree of overall accessibility (closeness centrality) are significantly positively related to the measures of academic performance. Indicators related to brokerage (betweenness centrality) or influence/prestige (eigenvector centrality), on the other hand, show comparatively weak negative relations. The weakest—yet still significant—of all relationships, concerns the question of whether a node is part of specifically outstanding island or not.

Table 3 Influence of different aspects of networkedness on the average number of citations on a researchers’ publication

These relationships can be observed for all three dependent variables under consideration (cf. Models 2.1, 3.1, Tables 4 and 5, respectively). Among them, the role of closeness centrality exceeds that of mere degree centrality to the largest extent for the average journal impact factor, where the difference in the coefficients amounts to an order of magnitude. In contrast, for the crown indicator this difference amounts to only about a factor of two. What differs more strongly is the degree to which the explanatory factors considered suffices to explain variance (R2) which differs between close to 50% for the average journal impact factor and less than 25% for the crown indicator. However, all models can sufficiently explain a robust share of the observed variance, taking into account that a number of other, more person- and institution-specific factors known to influence scientific performance have not been considered in this study.

Table 4 Influence of different aspects of networkedness on individual researchers’ crown indicator
Table 5 Influence of different aspects of networkedness on the average impact factor of journals in which a researcher publishes

With regard to the regression models which only include the conference network indicators, we also find that the number or, more precisely, breadth of overall (potential) encounters (degree centrality) plays a relevant role, much more so than the mere number of visits at conferences per se (Model 1.2, Table 3). Beyond this basic finding, however, the results could not differ more starkly. In the context of co-presence at conferences, the coefficient of accessibility is markedly negative. At the same time, influence and prestige among relevant peers (eigenvector centrality) are positively correlated, and more so than mere degree centrality. As for brokerage (betweenness centrality) or being part of specific circles (islands), the coefficients here are not significant.

In the case of co-presence at conferences, the overall relation of the coefficients is very similar, regardless of the dependent variable in question (cf. Models 2.2, 3.2, Tables 4 and 5). That said, coefficients are highest for citations per publication and for average journal impact factor, while they are less pronounced for the crown indicator—smaller by a factor of approximately two. While significant, it has to be noted that the explanatory power of a pure conference interaction-based approach is far more limited than that of a traditional co-publication-based approach, with about 20% of variance explained for citations per publication and average journal impact factor, and about 10% of variance explained for the crown indicator. Consequently, conference proceedings-based indicators explain less of the variance in academic performance (as measured by citation-based indicators) than publication-based indicators.

While we find that a conference interaction-based approach is capable of reaching substantial explanatory power (R2 reaching 50–60% of co-publication-based approaches), it does not add much when considered in combination with co-publication-based approaches (Models 1.3, 2.3, 3.3; Tables 3, 4, 5). Overall, accessibility in co-publication networks continues to dominate the overall range of effects, while the overall R2 values only increase very minimally, by hardly more than a percentage point. The coefficients are also similar in size and do not change very much compared to the models which only include co-publication-based or conference-based indicators.

In sum, our findings confirm our first hypothesis that a generic track record of past collaboration with other scientists in terms of co-publications—i.e. the overall number of contacts to distinct other scientists—will prove beneficial for academic performance. In contrast, they refute the hypothesis that the degree of influence which a scientist holds within their community as well as their inclusion in particular social circles would play an equally significant role. In fact, influence and prestige have—all else being equal—a negative relationship with academic performance. With regards to our third hypothesis, we find that accessibility is a key explanatory factor and deserving of more attention, while a brokering position does not seem to play a major role.

As for our fourth hypothesis, we find the general propensity of scientists to attend and present at conferences a highly significant—substitutive, albeit not complementary—explanatory factor. And in line with our fifth hypothesis, we indeed discover that it is an individual’s central role in regular meetings rather than their frequency of attendance which makes the difference for its explanatory power. While the frequency of attendance remains positive in its effect, this effect is an order of magnitude lower than that of degree centrality and eigenvector centrality. The effect of the number of published conference proceedings is actually negative.

Finally, a number of relevant complementary factors can be identified. In line with prior expectations, the effects of overall publication output, scientific age, male gender, and university (rather than private sector) affiliation consistently show positive coefficients across all models (not shown in the models).

Robustness checks

Robustness testing was conducted in various ways and is available for review in the annex to this paper. As already outlined in the main section, the two main analytical perspectives (networks of co-publication vs. networks of co-presence) were intentionally considered separately before we amalgamated them into one model. Moreover, correlation analysis of the independent variables and VIF checks were run on all models. Overall, we find that, subsequent to standardisation and logarithmic transformation, all assumptions required to run valid OLS regressions are supported by the structure of the dataset. Unsurprisingly, findings therefore prove robust if generalised modelling approaches (GLM) are applied to similar effect. Such approaches would however take up an undue amount of computing power on a sample this large, and more importantly, they are more difficult to interpret directly than the OLS models we chose.

To ascertain that effects do not solely result from empirical idiosyncrasies due to the inclusion of too many variables of limited relevance, we rebuilt the models technically through stepwise regression, finding no deviations of substantial relevance.

Furthermore, the global models documented above were re-run on specific segments of the overall population to rule out that effects constitute unspecific aggregates of diverse, potentially countervailing sub-dynamics, or simply the fact that conference proceedings tend to be a more common vehicle of communication in some disciplines than in others. For example, the models were re-run after excluding all non-co-publishing authors from the sample, or by basing them on a core network of co-publications which acknowledges links stronger than five. In addition, specific analyses were run for the hard sciences, the soft sciences, and the medical domain. Likewise, to control for field-specific idiosyncrasies all models were re-run with dummy variables for scientific disciplines, i.e. variables reflecting the distribution of each specific author’s publications across all 27 ASJC SCOPUS classes.

While specific coefficients and some levels of significance naturally differed as a result of such substantial interventions, the overall structure, mutual relation, and significance of detected effects remained largely unaffected. Against this background, the findings presented above can be considered robust and unaffected by purely technical issues as well as by problems related to conceptual validity.

Discussion

In the context of the existing literature, our findings offer interesting, additional perspectives. Contrary to some recent studies (Xu & Chang, 2020; Hâncean et al., 2021; Fassina Santini et al., 2021), our analysis suggests that hierarchy and prestige in the formal co-publication network may matter less for the quality of output than commonly assumed. Being part of specific “‘inner circles” may do so even less. To some extent, this resonates with earlier findings that “tiny and sparse networks” result in more citations than “crowded” ones (Ortega, 2014). Instead, the strong, even dominant role of closeness centrality suggests that on the whole the diversity and breadth of knowledge an author has at their disposal is more relevant than whether they are connected with other “stars”. This aligns with earlier findings that diversity of different kinds may be conducive to quality of output (Nomaler et al., 2013; Persson, 2010) while high prestige and influence may—under controls—even prove detrimental (Abbasi et al., 2011).

By and large, an analysis of formal collaboration in the academic domain thus seems to reflect a world in which the accessibility of knowledge (even if spread over several stages) is more important for the resulting quality of research than issues of hierarchy (prestige) or control (brokerage). Our analysis of co-publication networks provides more evidence of a knowledge-driven republic of science than one might expect and than some prior literature would have us believe. In particular, it is remarkable that being part of delineated subnetworks is hardly significant, and that, if it is significant, this association tends to be negative rather than positive. With the aim to corroborate this finding, we additionally considered alternative network measures like k-cores and p-cliques which also capture substructures of networks, all to no avail. Likewise, the potential role that some studies have attributed to brokerage could not be confirmed as relevant for quality (Abbasi et al., 2012; Tajedini et al., 2019) a doubt that had already been nurtured by prior research (Bordons et al., 2015).

At the same time, our analysis confirms that informal interaction indeed matters, as earlier studies had anecdotally suggested (Growe, 2018; Henn & Bathelt, 2015; Robertsson & Marjavaara, 2014; Torre, 2015). The anecdotally self-evident world of scientific hierarchies does indeed exist and carries weight. With a view to inferred co-presence at conferences, hierarchy or prestige adds substantial explanatory value to basic network positions. In contrast to this, having access to many others via second, third, fourth, etc. order connections does not have a positive effect but a negative one. Arguably, “meeting indirectly” is not a too relevant conceptual perspective on the practice of meeting in-person so that there is no meaningful interpretation for closeness measures. Instead, those may in some constellations provide evidence of a position slightly outside of the main core, which offers high closeness to many actors, yet direct access to none. To an extent, this interpretation may help to contextualise its otherwise counterintuitive negative effect.

In summary, the abovementioned contrasting images might best be interpreted as a reflection of two central processes within science which are superimposed and, despite their different nature, intricately connected.

The first one relates to meetings at conferences and concerns the initiation of high-quality research by scientific leaders, which therefore quite naturally resonates with notions of hierarchy and high-level interfaces. At this level of hierarchy and seniority, networks relate to negotiation and generally focus more on the agreement of deals and strategic partnerships than on the transmission of specific, thematically relevant knowledge.

The second one relates to high-quality formal collaboration which is usually executed by either junior team members or second tier faculty who thoroughly research the state of the extant literature. At this stage of their career, researchers predominantly access knowledge through formal documentation rather than through their personal networks. Moreover, it is not uncommon for senior researchers to fall back into this mode while formally working on a paper.

That said, it stands out that the factors of brokerage and being part of subnetworks or clusters, painstakingly included in this analysis, do not seem to matter at all. This markedly qualifies earlier discussions on the role of gate-keeping in science—and could suggest that at least the quality/visibility of publishing scientists’ work is not directly affected by such gate-keeping.

Arguably, our findings also suggest that we need to critically reconsider what traditional co-publication networks really reflect. To an extent, our standard assumption that hierarchies matter is built on the assumption that high quality research is produced by senior professors within a core network of equally established scientists. In practice, however, quality research is also commonly produced by young high performers who are free of administrative and other tasks and have more time to think and research broadly. Senior scientists, on the other hand, will quasi-automatically be deeply embedded in dense networks through the supervision of PhD students and by way of their formal responsibility for facilities, which oftentimes makes them co-authors on work led by others. However, such more formal participation does not necessarily improve the average quality of their output—since not all projects within an institute can be of high quality and since not all PhDs can be high potentials.

On a final note, it seems remarkable that—across all models—the different aspects of networkedness we considered are notably more effective in explaining an author’s successful access to relevant journals than in explaining the actual field-corrected impact of their individual publications (as per their crown indicator). This, too, may be considered a reflection of the implicit structures of gate-keeping and hierarchy (Cugmas et al., 2020) which seem to be disconnected from quality as such.

Conclusions

As a core methodological contribution, this paper provides a first approach towards operationalising the informal dimensions of interaction in science which had previously been theoretically discussed but never broadly captured by empirical means.

In our subsequent analysis, we find that personal interactions at conferences can indeed be considered a relevant predictor of publication quality, albeit in quite a different manner than is the case for traditional co-publication networks. At the same time, the parallel consideration of traditional co-publication measures and our novel approach does not markedly increase the explanatory powers of our models. Arguably, this is because both constitute two rather different, yet in substance inseparable sides of the same coin. Overall, our analysis leaves us with a positive impression concerning the formal perspective on publications. Apparently, quality is less primarily preconditioned by hierarchies and exclusive circles than some might suggest. At the same time, analyses of our newly developed measure underline that these realities are far from irrelevant. They may simply show themselves and become relevant in a different manner than commonly assumed and suggested.

Methodologically, our findings caution against an overly direct and detailed interpretation of co-publication networks based on ever more refined measures but without sound prior controls for the validity of specific aspects and measures. At the same time, they encourage the deeper exploration of the informal domain by quantitative means. As mentioned above, further research will be needed to improve the operationalisation of informal interactions as new sources of data become available—to confirm or contextualise this paper’s findings for conference visits. In the future, such detailed inquiries may enable us to better disentangle effects which relate to the limitations of the data and those which are of substance. In particular, international comparative studies in different science systems could add relevant perspectives.

From a science policy perspective, our findings support efforts to better include young scientists in fora of informal knowledge exchange and to avoid any further nurturing of hierarchical structures which seem to naturally emerge in research fields. Instead, the informal domain should be increasingly considered as a forum to transform formal second- and third-order access (i.e. knowledge about relevant people) into direct, personal contacts at various levels. This may well encourage more creativity and serendipity among those who de facto create high quality publications.