A new approach to analyzing patterns of collaboration in co-authorship networks: mesoscopic analysis and interpretation
This paper focuses on methods to study patterns of collaboration in co-authorship networks at the mesoscopic level. We combine qualitative methods (participant interviews) with quantitative methods (network analysis) and demonstrate the application and value of our approach in a case study comparing three research fields in chemistry. A mesoscopic level of analysis means that in addition to the basic analytic unit of the individual researcher as node in a co-author network, we base our analysis on the observed modular structure of co-author networks. We interpret the clustering of authors into groups as bibliometric footprints of the basic collective units of knowledge production in a research specialty. We find two types of coauthor-linking patterns between author clusters that we interpret as representing two different forms of cooperative behavior, transfer-type connections due to career migrations or one-off services rendered, and stronger, dedicated inter-group collaboration. Hence the generic coauthor network of a research specialty can be understood as the overlay of two distinct types of cooperative networks between groups of authors publishing in a research specialty. We show how our analytic approach exposes field specific differences in the social organization of research.
KeywordsNetwork analysis Co-author networks Scientific communication Chemistry
Scientific fields differ in their intellectual and social organisation, and consequently in their communication practices (Whitley 2000; Knorr Cetina 1999; Fry and Talja 2007). Investigating these differences via ethnographic studies alone is insufficient because evidence is gathered from only a small, local fraction of scientists in a field. In contrast, a bibliometric approach that analyzes large sets of publication data provides access to aggregate behavioural patterns of a cross-section of scientists in a research field. However, because of the standardization of formal scientific publishing across science, the bibliometric approach employed alone may fail to uncover underlying, field specific differences in scientific communication practices. In our comparative research into field-specific scientific communication cultures we combine qualitative ethnographic field studies with structural analysis of publication networks. In this manner, we can use small-scale, nuanced evidence from field studies to inform our interpretation of large-scale publication networks and investigate to what extent field specific practices are reflected in structural features of publication networks. This evolves a tradition of close-up analysis of scientific networks and communication practices started by Crane’s work (1972) on invisible colleges, and taken up more recently by Zuccala (2006).
Our specific concern is to understand the relevance and meaning of certain network features (such as co-author links and inter-citations) in the context of specific scientific fields, and to discover how they relate to concepts of community, and to communication practices that we observe in our field studies. The field specificity of global bibliometric indicators such as citations and journal impact factors (Moed et al. 1985; Althouse et al. 2008), the number of co-authors on a paper (Liberman and Wolf 1998), or the percentage of self-citations (Snyder and Bonzi 1998) is well known. However, the delineation of scientific fields, and hence the appropriate normalisation of such measures for comparisons across scientific fields remains problematic (Zitt et al. 2005; Adams et al. 2008). Recently, the characteristics of co-author networks as social networks have been highlighted (Kretschmer 1994; Newman 2001b). They share with other social networks global topological properties such as small world-property, clustering, and assortative degree mixing as well as a long-tail degree distribution and a scaling law for the clustering coefficient (Sen 2006). In the recent surge of work on the analysis of complex networks, and in particular on clustering algorithms to extract the modular structure of real world networks, co-author networks rank among the most prominent examples (e.g. Radicchi et al. 2004; Newman 2004; Palla et al. 2005). Most work in this area does not focus on analysis of network features specific to a scientific field. Instead, existing work uses co-author networks as a way to demonstrate algorithmic advances by comparing global network properties of co-authorship networks to other types of ‘real world’ networks (such as transportation or biological networks).
Those works in information science and bibliometrics that investigate the characteristics of co-author networks in specific fields from a social network perspective tend to focus on global topological network properties, or on the position and ranking of individual actors within the network, see e.g. Kretschmer (2004), Acedo et al. (2006), or Wagner and Leydesdorff (2005). See Liu et al. (2005) for an exception that at least touches on the group level organization. Co-author network analyses at the global (or the individual author) level fail to recognize the team-based organization of research in most scientific fields. As Guimera et al. (2007) point out for networks with a modular substructure, global properties fail to capture important structural and functional distinctions. To discover those one has to investigate the mesoscopic structure of networks that takes into account module interconnectivity and differences in connectivity patterns between nodes based on their structural position in the network. Accordingly the study presented in this paper focuses on the mesoscopic level of analysis—that is we analyze connectivity patterns between modules of closely interconnected authors in co-authorship networks in order to explore the field-specificity of community structures and communication patterns.
Early on in our research we found it problematic to interpret co-authorlinks between co-author groups as indicators of collaboration between those groups. We realised that in the weakly interconnected field that we were studying, co-authorship links between groups did not indicate direct inter-group collaboration but were just residues of the fact that individuals migrated between groups on their career path, e.g. from PhD student to postdoc, an observation made also by Nepusz et al. (2008). In this paper we develop an approach to distinguish different types of connectedness that co-author links between groups represent such as inter-group collaboration, career migration or exchange of services or samples. To achieve this we match observations of structural features of co-author networks at the mesoscopic level with accounts of our field study participants on the underlying scenarios of interaction and coauthorship. We then proceed to compare collaboration patterns of three research specialties in chemistry as they are reflected by the mesocopic structure of their co-author networks.
A distinguishing aspect of our research is the manner in which qualitative and quantitative approaches are closely interlinked. The study presented here is exemplary for the way in which both approaches interact: qualitative understanding of research practices and context help us not only to evaluate quantitative outcomes, but also to further evolve our quantitative methods, and quantitative results guide our attention for further qualitative study. Take for example the issue of analyzing the community structure of social networks, which is an important aspect of our research into communication practices. As described further below, a wide variety of clustering algorithms exist that will all deliver quite different partitions of a network. To decide whether author groupings obtained from clustering of a co-author network provide us with a meaningful research unit for the analysis of scientific communication and interaction patterns of a research field, we need validation and interpretation of those clusters in the context of this specific field (Caruana et al. 2006; Schaeffer 2007). Given such a clustering we can then look at the network of clusters that shows the interactions between author groups and direct our qualitative research efforts at further understanding those interaction patterns—that is, to uncover those motivations and processes that underlie the structural patterns (Lievrouw 1990; Zuccala 2006).
To develop a deep qualitative understanding of communication practices in different fields of chemistry we are conducting ethnographic field studies in selected research groups in Europe and the USA. We will call these groups seed groups in the remainder of this paper.1 A precondition for selecting a group for our field study is that it is an internationally recognized player in some specialised field of research, below the sub-discipline level. The publication data sets used in this study represent those research specialties in which our seed groups are actively engaged. As a result, for each data set we have access to informants with whom we can check the interpretation and significance of features that we detect in our network analysis.
Average # of authors per paper (median)
# Authors (reduced)
# Authors in giant (relative size)
# Clusters in giant
Average # of authors per cluster (median)
The field studies helped us to understand the research activities our seed labs were involved in, the social organization of the research groups, and experiences people had made in collaborations with other groups. Field visits with observations and interviewing of group members lasted between 4 and 6 weeks per group. Altogether we visited 5 different groups. Further, we acquired specific feedback to help interpret the structural features we were extracting from the coauthor networks. We conducted such interviews with the PI or another senior researchers in a group. The development of the analysis presented here is based on feedback interviews with 5 different researchers, amounting to about 15 h of conversation, partially audio recorded, sometimes documented by extensive note taking.
The nature of the node ‘communities’ that are extracted from a network depends on the clustering algorithm that is used (Fortunato and Castellano 2007). Some clustering procedures are hierarchical by design, so that they offer a hierarchy of groupings, starting from single nodes to the total set. Newman (2004) shows how his non-hierarchical algorithm can still be used to repeatedly drill down into a large, 56,000 author data set, first detecting clusters at the level of entire fields (such as high-energy physics, or condensed matter physics), then smaller specialities, and eventually after a fourth iteration, clusters corresponding to research groups (such as his own with 28 members). Other algorithms have a very restrictive definition of clusters as very tight groupings of nodes, e.g. the concept of k-cliques in Palla et al. (2005), and therefore produce clusters with little variation of group internal structures.
We use an information-theoretic clustering algorithm4 for undirected, integer-valued networks (Rosvall and Bergstrom 2007) to partition the co-author network into clusters of closely interconnected co-authors. We have chosen this algorithm because we have found that at least for the few seed groups about which we have some in-depth knowledge from our field studies, it extracts very well what Seglen and Aksnes (2000) has called ‘functional research groups’—that is, basic research collectives that not only contain a collocated group of researchers in a laboratory led by a principal investigator (PI), but also closely cooperating domestic or international colleagues and visiting scientists.
We extract the giant component of the co-author networks. Then we use pajek (Batagelj and Mrvar 2003) to extract and visualize the cluster that includes the PIs of our seed groups, as well as some of the clusters in their neighbourhood. We validate the interpretation of clusters in interviews with our informants.
Using standard network centralization indices (degree, closeness, betweenness; Freeman 1978) calculated with pajek, we compare the internal structure of selected clusters. We further explore the composition of clusters using the approach by Guimera et al. (2007) to distinguish different types of nodes based on their cluster-internal and cluster-external links. The classification distinguishes seven node roles. It distinguishes between hub-nodes and non-hub nodes based on the number of cluster internal links a node has. Further subtypes of hubs and non-hubs are defined based on the distribution of their external links to other clusters (see method section in Appendix for details).
For analysis of the linking patterns between clusters we focus on the neighbourhood of a seed group within the original giant component of the author-level network—that is, we extract the sub-network that contains all authors of the seed-group and all authors of those clusters that are linked by at least one co-author link to the cluster of the seed group. We visualize these sub-networks in pajek and inspect the connection patterns between the seed cluster and its neighbours.
We then ask our informants from the seed groups to review their neighbourhood network and to tell us about their scientific relationship to the neighbouring clusters, and how the co-author links have come about. In those interviews typical scenarios emerged that we could match with typical linking patterns as described below in the “Results” section. They are the foundation for the classification of between-cluster connections in the next step.
Based on our observations in the previous step, we classify links between clusters into two types and build from each type a cluster-level network where nodes represent clusters: we regard two clusters as connected by a transfer type link if in the underlying co-author network the two cluster are no longer connected by co-author links when (hypothetically) one or two author nodes from the network are removed. In all other cases we classify between-cluster linking as collaboration. In both cases we assign as weight to the cluster-level link the sum of the weights of the underlying co-author links.
We conclude our investigation by comparing systematically the empirical data for three research fields with regard to the properties of their population of clusters and the properties of their transfer and collaboration networks.
Basic components of the mesoscopic structure of co-author networks
Interpretation of clusters
In interviews with the PIs of the seed groups we validated that their respective cluster extracted from the co-author network captures their immediate, collocated research groups (researchers of various seniority levels from PhD students over postdocs to deputies or subgroup leaders) plus relevant external cooperation partners (from their own institution, or national and international cooperations). Occasionally the PIs were slightly surprised that a certain individual was subsumed into their cluster and not independently represented. This particularly seems to be the case for those individuals that have a strong institutional identity distinct from the academic group of the PI—as is the case for research group leaders from industrial companies. Note that our data sets capture only publication activity in very specialised areas of research—many PIs and their groups in our field study engage in several distinct research specialties. Consequently the representation of research groups by co-author clusters in this study is only partial, as it includes only those group members that are involved in the specific research specialty that we targeted when defining our data set.
The node role analysis underlines this observation of differences in cluster organization. Based on the internal linking structure of a cluster it distinguishes hub-nodes and non-hub nodes. The PI-led group depicted in Fig. 2 is characterized by having only one single hub-type node, whereas the other two clusters show several hub-type nodes. From investigation of institutional affiliations given in the underlying WoS records and participant feedback, we identify one of these latter clusters as an international collaboration network, and the other as a network of closely cooperating colleagues from a major research institute in this research specialty.
Between-cluster linking patterns
In the list below, we match these connection patterns to the commentary of one of our informants on the real world scenarios underlying those co-author links. Numbers in parentheses identify the corresponding pattern in the networks in Fig. 3. These examples are taken from the seed group in field C—similar patterns have been observed and validated for the other fields.
visiting scientist with links home into national research network (1)
career migration (2)
repeated instances of career migration (2b)
career migration of now closely collaborating colleague (2*)
1-off commissioned work (3)
1-off cooperations on independent topics (3b)
funded project collaboration of subgroup leader (7)
‘unauthorized’ collaboration by postdocs (4)
provision of synthesis samples (5)
exclusive cooperation by closely collaborating colleague (6)
intensive thematic and methodological international (Swiss) cooperation for since PI’s postdoc time (8)
very broad, thematic based, many faceted international (UK) cooperation (includes career migration and temporal exchange of postdocs and PhD students) (9)
many-faceted methodological international (Dutch) cooperation (10)
cooperation on many topics due to complementary knowledge, joint PhD students and exchange of staff, many funded projects (11)
PI colleague at same institute—intensive topical and methodological cooperation; much stronger informal cooperation than formal in form of publications (12)
multi-faceted cooperation with national institute, exchange of senior staff who bring along their networks (13)
Thematic cooperation, nurtured by EC funding, no exchange of PhD students; people come for measurements and leave again; disciplinary very distant (electrical engineers) (14)
Seed group conducts measurement services to this group, only subgroup of instrument involved, institutionally supported by partner agreement between institutions, PI former postdoc in seed group (15)
Russian institute, seed group is partially fused with this institute as several members are partially funded by seed group, repeated exchange of coworkers (16)
Case study: comparison of mesoscopic structure for three research specialties
The purpose of this case study of three specialty fields within chemistry is to demonstrate the kind of comparative analysis our approach enables and the sensitivity of this approach for uncovering differences in the collective organization of scientific research specialties.
Properties of cluster populations
As reported in Table 1, the clustering algorithm extracts from the giant component of field A 2005 clusters, from field B 578 clusters, and from field C 1191 clusters. The following properties refer to these cluster populations that make up the giant component of the network in each field.
For further analysis of the associations between various cluster properties and how they vary across fields, we group the clusters by size as either small (n ≤ 10 authors), medium (10 < n ≤ 40), or large (40 < n). To investigate the temporal composition of the cluster population for each field we divide the time period covered by our data sets into three slices of approximately equal length. We distinguish four age cohorts of clusters based on the publishing activity of their authors during these time slices: continuous—cluster authors have published during all three time slices; recent: publishing activity only during the latter two time slices; new: publishing activity only during the last time slice; extinct: active in first and/or second time slice, but not in most recent time slice.
Figure 4 suggests for field A an association of cluster size with membership in a specific age cohort; scatter plots for fields B and C (not shown here) display the same trend. Indeed, taking the data of fields A, B, and C together, we find a statistically significant association between the categorical variables of cluster size and cluster age. Whereas overall 50.3% of clusters are small, for the age cohort of continuous clusters the proportion of small clusters decreases to 35.5%.
As depicted in row 2 of Fig. 5, all fields have only a small percentage of extinct clusters (3–4%), and the most numerous age cohort are the continuous clusters (48–74%), the second largest the recent clusters (19–40%), and the smallest surviving cohort are the new comers (5–11%). In comparison, field B has relatively more recent clusters than fields A and C, field C has relatively more new clusters than field A, and B, and field A has the largest proportion of continuing clusters. Note that these statements refer to a clustering obtained post hoc from data accumulated over the entire time period. Hence these observations do not translate directly to growth in terms of the formation of new clusters. If a scientist e.g. after a postdoctoral position with a group moves on, and over time during her or his career becomes the seed of a new group that is identified by our post-hoc clustering, this new cluster will still be categorized as continuous and not new, because at least one of its members (the former postdoc) has been publishing in the field all along. Nor do these statements on the relative sizes of cluster age cohorts translate directly to growth in terms of the entrance of new authors into the field. This can be seen when comparing with Fig. 1 that shows that all three fields have gained the largest portion of new authors during the most recent time slice whereas the largest portion of clusters are continuing clusters.
Collaboration and transfer networks
Following our classification of between-cluster links into transfer type links (meaning that the hypothetical removal of one or two nodes results in the two clusters no longer being connected) and collaboration type links (all other cases), we obtain for each field two cluster-level networks that differ greatly in size: a transfer network, and a much smaller collaboration network. Of all between-cluster links, only a very small percentage qualifies as collaboration type links: for fields A and C these are 4.7% and 5.4%, respectively, about twice as many as in field B with 2.4%, see row 5 in Fig. 5. Whereas for all three fields almost all clusters of the giant component are part of the transfer network, the fraction of clusters that are involved in the collaboration network is considerably smaller: for field A 28.8% (=560 clusters), for field C 30.5% (=348 clusters), and for field B 8.7% (=49 clusters); see row 4 in Fig. 5. The collaboration networks of fields A and C have a giant component, whereas the collaboration network of field B is fragmented into several unconnected components of similar size.
To get an idea of the local densities of the collaboration networks, we look at the degree distribution for the cluster nodes in the collaboration networks Whereas the median degree of the clusters is 2 for all three fields, the average node degree for fields A and C is higher (3.3 and 3.1, respectively) than for field B (2.2), indicating a more strongly right-skewed distribution. Indeed, the highest cluster degree in field B is 7, whereas in fields A and C, 10.6% (37) clusters, respectively 6.3% (126) clusters have degrees higher than 7, going up to a degree of more than a hundred for one of the clusters in field C. On inspection of the names of the most productive authors in the high-degree clusters with more than 20 collaboration links we find almost exclusively common Chinese and Korean names, an observation we will come back to in the “Discussion” section.
Fields A and C, show a higher local density of their transfer networks as well: the medians of cluster degrees are 15, and 11, whereas the median for field B is 6. The average cluster degrees for fields A and C are about 29 and 21, whereas for field B the average cluster degree in the transfer network is only about 8.
The analysis of mesoscopic network features provides us with insight into the underlying social configurations and processes of scientific collaboration and cooperative behaviour, and generates new research questions that we will highlight in the following sections.
Interpretation of basic mesoscopic features of co-author networks
We observe that the basic collective units performing research in the specialties that we are studying differ substantially in structure and composition. The uni-centred hierarchical footprint of a PI-led group (corresponding to the functional research groups Seglen and Aksnes (2000) identified in his case study on microbiology) contrasts with the multi-centred research clusters. These latter clusters have lower centralization indices and contain multiple hub-type nodes. They correspond to multiple-PI collaborations representing intra-institute, national, or international research networks. We conclude from this observation that a variety of social configurations and collaborative working modes contribute to research in a specialty.
Our structural analysis of between-cluster co-author links and how they match to underlying real world scenarios points to a fundamental difference between 1-1, 1-m or 2x (1-m) co-authorship links on the one hand, and m-m co-authorship links on the other hand. These basic types imply different forms of interaction and collaboration. The former are footprints of people moving between labs and of one-off cooperations involving sample exchange or measurement services rendered.5 According to our informants’ accounts, migration events can take different forms that are more or less strategically influenced by the interests of the home lab: (1) reciprocal exchange of students or young researchers between complementary groups, (2) targeted acquisition of training in a particular domain or skill that a young researcher brings back to his home lab when returning after a postdoc stint at another lab, or (3) career migration where the individual student or young researcher chooses rather independently what group to join next to evolve his or her career—personal recommendations of the current PI and his or her network of acquaintances, as well as the reputation and resources of the new lab influence such decisions. Hence this type of inter-group connections that we have termed ‘transfer links’ constitutes a group-level network of exchange of people, skill, material, and measurement services within a specialty.
The m-m connection patterns on the other hand correspond to substantial inter-group collaborations. They are substantial in the sense that they are recognized by the leaders of our seed groups as inter-group collaborations worth mentioning when reflecting on their scientific career and evolution of research interests. Such inter-group collaborations include several group members, result in several publications, and extend over some extended time period. Distinguishing between those two basic structural types enables us to investigate and compare collaboration practices in scientific fields in these two different dimensions.
We expect that the rigorous structural distinction between the two basic types will occasionally lead to an erroneous classification of underlying collaboration scenarios. For example, an inter-group collaboration may expose the misleading structural form of a 2x (1-m) transfer link in a situation where an intensive inter-group collaboration includes only two members from one group; for a small group, two authors may correspond to a substantial fraction of that group (a realistic scenario would be a small independent theory group collaborating with a larger experimental group). We suspect that this kind of classification error is negligible: the distribution of linking types depicted in row 5 of Fig. 5 indicates that in the fields we are studying the 2x (1-m) structural pattern occurs only in about 10% of the cases overall. Furthermore, the large majority of cases that we encountered in the analysis of the neighbourhood networks of our seed groups represented traces of two separate migration events between the clusters, and was not the footprint of inter-group collaboration.
Based on these interpretations of basic mesoscopic features of co-author networks we will proceed with a comparative study of collaboration patterns reflected by co-author networks of three research specialties. We will first discuss generic findings that hold for all three fields, before we address the question what we learn about differences between the fields.
Mesoscopic structure of co-author networks in chemistry: generic findings
The pie chart table in Fig. 5 gives a systematic overview of the properties of the cluster populations in the three fields. There are a number of generic findings that hold for all three fields. First, cluster size is associated with cluster age, generally the older the cluster the larger the cluster. For the most part, this is an artefact of an accumulative data set, since we do not delete nodes or links from the network after a certain period of ‘inactivity’. Hence especially for the older clusters, we do not capture the clusters as instantiated at any given point in time, but include a halo of authors who have in the meantime left the group and perhaps the field, or even science. So increasing the temporal resolution and studying the temporal evolution of clusters should give better insight into what constitutes genuine growth, and a genuinely large cluster in a research specialty at a given time, and whether there are any additional selection effects at play that make small clusters more likely to discontinue than large ones. A further line of inquiry for future work is to look at the substantial portion of continuous clusters (between 30 and 45% depending on the field) that remain small although they have been active over the entire time period.
Second, the hubness of clusters is also associated with cluster size; the larger the cluster, the more likely it is that it will include a hub node, or for very large clusters even several hub nodes. Whereas the interpretation of a single-hub cluster as a research group led by a PI seems straightforward, the question arises about what social configurations and collaborative working modes underlie the small non-hub clusters and the large multi-hub clusters. Our initial observation of multi-centre structures (see Fig. 2) that correspond to a specialised research institute and a closely collaborating international research network exemplify possible scenarios for multi-hub configurations of clusters. Hence, we take the fraction of multi-hub clusters in a field as a first, rough quantitative estimate of the frequency of either institutional or international multi-centre networks in the three fields: 2.6% in Field B, and 9.4%, and 9.5% in fields A and B, respectively. We speculate that small, hubless clusters on the other hand may represent small-scale, informal collaborations of equal ranked researchers. Alternatively, they may correspond to the early co-author footprint of groups newly entering the field. Such embryonic footprints would be too weak to reflect the underlying social hierarchy in seniority and productivity. However, the fact that most of these small hubless clusters are not new, but are either continuous or recent (see Sect. 4 of the Supplemental Material for details), seems to devalue the latter argument somewhat. Therefore, further research is needed to understand the composition and role of those entities in a research specialty that are represented by small hubless clusters in the coauthor network.
Finally, we find that the clusters in the collaboration networks tend to be larger and older clusters (see Sect. 5 of the Supplementary Material for details). In our interviews participants indicated a number of preconditions for successful inter-group collaborations, emphasizing that the ‘inter-personal chemistry’ has to be right. They stated that sufficient overlap in research interest has to exist between the partners (although not too much overlap so that no competition may develop), and that appropriate funding needs to be available or has to be mobilized. This would suggest the reason that time is an important factor for potential partners in inter-group collaboration to meet, to develop trust into the likely benefits of collaboration with one another, and to eventually carry through collaborative research. Nevertheless, even if we only regard the continuous cohort of clusters we still find the association of collaboration propensity with cluster size. For fields A and C this association is very pronounced, with only 13.5%, respectively 16.9% of small clusters collaborating, whereas 58.8%, respectively 62.7%, of the large clusters collaborate. For field B that shows a lower collaboration propensity overall, the corresponding numbers are 6.8% for small clusters, and 22.2% for large clusters. This lower involvement of small but continuous clusters in the collaboration network may indicate that their research focus is marginal to the research specialty under investigation, or that they contribute to the core knowledge of the specialty operating by a very specific (less-collaborative) working mode. Again, a better understanding of the social organisation and working mode of coauthor groups represented by small clusters should help to illuminate the reasons for their lower inter-group collaboration propensity.
Our results on the transfer and collaboration networks show that the extent of the giant components of our co-author networks is primarily defined by transfer type relationships between clusters. This implies that about 90% of authors with at least two publications in the field are part of a group network where groups are loosely in contact by some kind of transfer interaction described above. So whereas the transfer network can be characterized as a pervasive global cooperative network, the collaboration network is much more selective. In all three fields it includes less than a third of the clusters, and less than half of the authors from the giant components of the networks.
From the visualization in Fig. 8 of the dominant geographical affiliations of clusters in the collaboration network we can deduce that geographical proximity at continent level correlates with the local density of the collaborative network such that substructures (whether separate components in field B, or just denser subnetworks in fields A and C) tend to be geographically homogenous. Interestingly, (the much fewer) North-American clusters contrast somewhat with European and Asian clusters in that they do not form a distinct substructure, but seem more globally interspersed. Hence our analysis would seem to confirm the continentalization of science hypothesis of Leclerc and Gagné (1994) only for Europe, while North America and Asia expose a different pattern, at least for the research specialties under investigation here: North-American clusters have a rather low tendency to link to one another, and the Asian network exposes a distinct substructure of major national networks (based on author names we can identify a Chinese, Japanese, and a Korean sub-cluster in fields A and C, plus a small distinct Indian sub-cluster in field A).
Mesoscopic structure of co-author networks in chemistry: field specific differences
Apropos of the differences that we observe between the three fields, there are some observations that point to unique field-specific conditions, such as the historical evolution of a field as reflected in the growth pattern of its co-author network, and the relative sizes of its cluster age cohorts. Also the features of the geographically resolved collaboration networks in Fig. 8 point to differences in the material conditions and research styles between the three fields. For example, the stronger integration of the network in field A may be the consequence of large, expensive, internationally shared instrumentation used in field A. The relative under representation of North American clusters especially in field C, and the lack of a dense North American sub-network of clusters, matches with the perception of one of our informants about funding priorities of the U.S. American science system that provides little funding for collaborative research in this research specialty.
Co-author clusters in field B are smaller than in fields A, and C in the sense that field B has less large clusters, and more small clusters than fields A and C. This indicates that the size of basic collective units necessary to make relevant research contributions to the respective fields is smaller in synthetic chemistry than physical chemistry as represented by our fields A, B, and C.
The predominance of single-hub clusters over multi-hub clusters in field B points to a preferred organizational model where one senior researcher based at an University leads a group of younger researchers and students that he or she trains. While this model is not unusual in fields A and C either, those fields show three times as many multi-hub clusters than field B, indicating a greater proportion research networks, either institutional or of distributed groups teaming up to form larger collective units.
A greater collaboration propensity in fields A and C is further underlined by the analysis of the properties of the collaboration networks in the three fields. As Fig. 8 shows, the difference between the collaboration network in field B on the one hand, and the collaboration networks of fields A and C on the other hand, are striking. The small size of the collaboration network of field B is not a direct consequence of the smaller overall size of the co-author network of field B, but due to the lower percentage of clusters participating in the collaboration network, and the lower percentage of collaboration type between-cluster links. Also, the collaboration networks of field A and C are locally denser (higher cluster node degrees), and more cohesive, in the sense that they have a giant component, whereas the collaboration network of field B breaks up of into separate, disconnected components of similar sizes.
Another indicator that points to a lower collaboration propensity of field B is the observed shift of the distribution of node role types for non-hubs and hubs in fields A and C towards node types that are more connected to other clusters. Field B has a higher proportion of so called ‘ultra-peripheral’ non-hub nodes that do not have any outside coauthor links, as well as a higher proportion of ‘provincial’ hub nodes that have few if any outside links. This could be explained by field B having a majority of PI-led groups of students and possibly postdocs that have no or minimal collaborative links outside of their own group.
With only three research specialties in this case study the empirical basis is too small to generalize findings to the entire sub-disciplines of physical chemistry or synthetic chemistry. But the observations made suggest the sensitivity of our approach to detecting field-specific and possibly sub-discipline specific collaboration patterns, and they demonstrate its power to extract salient features and to generate directions for further research to understand the social configurations and processes underlying differences in collaboration patterns.
Limitations of this study
Finally, we point to limitations of this study—some are technical that we plan to approach in future work, some are fundamental, inherent to the approach.
We build the co-author networks from data covering a period of roughly 20 years. Our analysis introduces only a preliminary temporal analysis. We see some value in refining the temporal analysis to study cluster evolution with time, but there is an inherent limit to achieving this kind of resolution from publication data. Publications tend to be temporarily sparse for indicating underlying research activity and social organisation (with sometimes many years’ of delay until some result possibly obtained many years earlier gets published), such that if we chose a time window too small, we will underestimate the size of co-author groups. Hence to get a sense of a research group as a research performing collective and the kind of material and social resources that it builds upon, we need complementary information, e.g. from observation in a field study.
Author name disambiguation
Certain observations indicate that especially author name homonymy due to a small set of common surnames used widely in some East Asian countries such as China and Korea, distort the co-author networks and their structures. This issue will impact our analysis in several ways and the net effect requires careful study. From our ongoing work on such an analysis (to be published) we can report that the qualitative features discussed here, in particular the field differences in the collaboration networks including their geographical ordering remain valid also for a disambiguated version of the data sets used here.
Another challenge remains the issue of field delineation and ensuring that a publication data set is representative for a research specialty. Depending on the field and the agreement among actors about its intellectual boundaries, developing appropriate lexical queries in interaction with our study participants is a time consuming, iterative process. In those areas we are studying there is no easy match to a defined set of specialised journals that one could focus onto short-circuit the process. In the current stage of our work we prefer this interactive process as it tells us also about the manifold perspectives of participants on their research context, and the complexity of the construct ‘research specialty’. Still, there are ongoing efforts to refine and automate field delineation (Bassecoulard et al. 2007; Mogoutov and Kahane 2007) that might prove crucial for realizing larger comparative studies.
Our analytic method is calibrated to the research environment of certain fields in chemistry and physics. We believe that conceptually it can be transferred to other fields and disciplines as well. But it may have to be significantly altered to be effective, since in other research contexts different data and interaction forms will be relevant (e.g. in computer science the predominant role of conferences, and secondary role of journal publications). Finally, we are capturing here exclusively collaboration forms that manifest themselves in co-authorship—other forms of informal collaboration are not represented in the networks, although informants underline their importance to their work. There are additional traces of interaction and influence, such as inter-citation, or workshop participation, to complement the picture, but there will remain important forms of informal exchange that are not documented.
Summary and conclusions
Using a combination of qualitative and quantitative methods we arrive at fundamental insights about the collective organization of research specialties in chemistry. We base our study on observations from the mesoscopic analysis of linking patterns in clustered co-authorship networks. These point to different structural types of co-author groups as well as different types of collaborative relations between co-author groups. We clarify the meaning and validate the relevance of those structural differences by matching these structural features with real world scenarios described in interviews with field study participants. We proceed with an empirical investigation of such features in the co-authorship networks of three research specialties in chemistry. This exposes a number of generic features of collaboration such as its geographical ordering, as well as field-specific and possibly sub-discipline specific features such as dominant structures of the most basic collective units in a research specialty and inter-group collaboration propensity.
What is the nature of the ‘scatter of’ small co-author groups—their social organization and scientific working mode; how does this scatter of small groups relate to the transient and less productive ‘scatter authors’ that Morris (2005) and Morris et al. (2007) contrasts with the ‘core authors’ in a research specialty?
How is international collaboration in a research specialty shaped; in what way and how strongly are national research communities internally and internationally interlinked? Whereas previous bibliometric studies into international collaboration based their analyses on the statistical co-occurrence of countries in paper affiliations (Glänzel and de Lange 1997, 2002; Zitt et al. 2000), the analytic tools developed in this paper provide access specifically to the network of inter-group collaboration in a research specialty—and hence the self-organizing network character of science emphasized by Wagner and Leydesdorff (2005) and Leydesdorff and Wagner (2008), but at the level of research collectives in research specialties. Inspection of features of these networks will help to identify further locations of interest e.g. to explore motivations and conditions of collaboration.
We intend to further develop our methodology by a refinement of the network analysis to explore the temporal dimension of cluster growth, the structural organization of clusters as basic collective research performing units, and the role of author nodes in particular structural positions. An important pre-condition is a reasonably good disambiguation of author identities that we are currently working on. Further we plan to extend the empirical base into other research fields to explore variations in patterns and how they may relate to epistemic differences between fields.
We call these ‘seed’ groups as we regard them as entry points into scientific communities, and plan to extend our field studies following links (of cooperation or competition) of these seed labs.
It is worth noting that these specialties do not fall squarely into single sub-disciplines but typically unite the efforts of researchers who identity with different subdisciplines. For field B which we label here as ‘synthetic chemistry’ these are mainly organic chemists, inorganic and organo-metallic chemists as well as polymer chemists. For field A which we label here as belonging to the subdiscipline ‘physical chemistry’ these are physical chemists as well as experimental and theoretical physicist, often with a background in atomic and molecular physics, but also in nuclear physics. For field C these are indeed mostly physical chemists.
When comparing this number to other co-author networks in the literature, remember that this number is calculated for a reduced co-author network (after excluding one-time authors). Hence this number will overestimate the relative size of the giant component for the unreduced network.
Available from Martin Rosvall’s home page at http://www.tp.umu.se/~rosvall/code.html.
Our field studies confirm that in areas of synthetic chemistry the task of having to find someone with the instrumental equipment to conduct certain measurements on your sample is very common.
Institutional Review Board, http://en.wikipedia.org/wiki/Institutional_review_board.
We are indebted to our field study participants. Further, this research has been made possible through financial support by the National Science Foundation through grants IIS-738543 SGER: Advancing the State of eChemistry, DUE-0840744 NSDL Technical Network Services: A Cyberinfrastructure Platform for STEM Education, and NSF award 0404553. Support also came from Microsoft Corporation for the project ORE-based eChemistry. We are grateful to those that make our work so much more effective by making neat tools and algorithms available on the Web, such as Martin Rosvall (infomap clustering code), Vladimir Batagelj and Andrej Mrvar (pajek), Michael Weseman (plot), and Peter Mcaster (OmniGraffle extensions for pie charts).
- Althouse, B., West, J., Bergstrom, T., & Bergstrom, C. (2008). Differences in impact factor across fields and over time. Arxiv preprint arXiv:0804.3116.Google Scholar
- Bassecoulard, E., Lelu, A., & Zitt, M. (2007). A modular sequence of retrieval procedures to delineate a scientific field: From vocabulary to citations and back. In Proceedings of 11th international conference on scientometrics and informetrics (ISSI 2007), Madrid, Spain, pp. 25 ff.Google Scholar
- Batagelj, V., & Mrvar, A. (2003). Analysis and visualization of large networks. In M. Juenger & P. Mutzel (Eds.), Graph drawing software (pp. 77–103). Berlin: Springer.Google Scholar
- Caruana, R., Elhawary, M., Nguyen, N., & Smith, C. (2006). Meta clustering. In Proceedings of the sixth international conference on data mining (ICDM’06).Google Scholar
- Crane, D. (1972). Invisible colleges: Diffusion of knowledge in scientific communities. Chicago: The University of Chicago Press.Google Scholar
- Fortunato, S., & Castellano, C. (2007). Community structure in graphs. arxiv: 0712.271.Google Scholar
- Knorr Cetina, K. (1999). Epistemic cultures—How the sciences make knowledge. Cambridge: Harvard University Press.Google Scholar
- Lievrouw, L. A. (1990). Reconceiling structure and process in the study of scholarly communication. In C. L. Borgman (Ed.), Scholarly communication and bibliometrics. London: Sage Publications.Google Scholar
- Morris, S. (2005). Manifestation of emerging specialties in journal literature: A growth model of papers, references, exemplars, bibliographic coupling, cocitation, and clustering coefficient distribution. Journal of the American Society for Information Science and Technology, 56(12), 1250–1273.CrossRefGoogle Scholar
- Morris, S., Goldstein, M., & Deyong, C. (2007). Manifestation of research teams in journal literature: A growth model of papers, authors, collaboration, coauthorship, weak ties, and Lotka’s law. Journal of the American Society for Information Science and Technology, 58(12), 1764–1782.CrossRefGoogle Scholar
- Sen, P. (2006). Complexities of social networks: A Physicist’s perspective. Arxiv preprint physics, 0605072.Google Scholar
- Whitley, R. (2000). The intellectual and social organization of the sciences. Oxford: Clarendon Press.Google Scholar