Identification of successful mentoring communities using network-based analysis of mentor–mentee relationships across Nobel laureates

Skills underlying scientific innovation and discovery generally develop within an academic community, often beginning with a graduate mentor’s laboratory. In this paper, a network analysis of doctoral student-dissertation advisor relationships in The Academic Family Tree indicates the pattern of Nobel laureate mentoring relationships is non-random. Nobel laureates had a greater number of Nobel laureate ancestors, descendants, mentees/grandmentees, and local academic family, supporting the notion that assortative processes occur in the selection of mentors and mentees. Subnetworks composed entirely of Nobel laureates extended across as many as four generations. Several successful mentoring communities in high-level science were identified, as measured by number of Nobel laureates within the community. These communities centered on Cambridge University in the latter nineteenth century and Columbia University in the early twentieth century. The current practice of building web-based academic networks, extended to include a wider variety of measures of academic success, would allow for the identification of modern successful scientific communities and should be promoted. Electronic supplementary material The online version of this article (doi:10.1007/s11192-017-2364-4) contains supplementary material, which is available to authorized users.

One reason a strong positive skew was found for ancestors, descendants, and 102 mentees/grandmentees was due to the nature of network data, where some individuals serve as 103 source nodes without ancestors and other individuals serve as sink nodes without 104 descendants.This increases the number of outcomes measuring zero in the data. In this case, 105 having zero Nobel family members is a result of having zero family members. Alternatively, a 106 number of individuals in the network have ancestors or descendants, but none of them are Nobel for this in estimating the number of local Nobel laureate family, because inclusion in a connected 123 network necessarily meant that at least one family connection existed.

124
For all four analyses, negative binomial models were chosen to adjust for greater than 125 expected dispersion in the data (i.e., a high variance to mean ratio). Spearman's correlations (see 126   Table 1) indicated that the number of Nobel laureate family members was positively related to 127 the size of the academic family. Therefore, in each case, the size of the academic family was 128 entered along with Nobel status as a predictor of the size of the Nobel laureate academic family.

129
As described in the method, the significance level for each analysis was adjusted by comparing 130 the observed test statistics with a distribution of expected test statistics, derived from 1,000 131 topologically identical networks, each with a random permutation of Nobel status. The 132 regression model coefficients and the distributions of random coefficients used to adjust the 133 significance levels of predictors in the models are available in Table S1  In contrast to the previous two results, the number of mentees/grandmentees did serve as 142 a significant predictor of number of Nobel laureate mentees/grandmentees (adjusted p < 0.001).

143
Still, after controlling for family size, Nobel laureates had a greater number of Nobel laureate 144 mentees and grandmentees than did non-Nobel laureates (adjusted p < 0.001). Finally, Nobel 145 . CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/075432 doi: bioRxiv preprint first posted online Sep. 15, 2016; laureates also had a greater number of local Nobel Laureates in their academic family than did 146 non-Nobel laureates (adjusted p < 0.001). The number of local academic family members did not 147 significantly predict the number of Nobel laureates (adjusted p < 0.964).    The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/075432 doi: bioRxiv preprint first posted online Sep. 15, 2016; trained and spent his early years as an academic in Italy during the early 20 th century, but 209 traveled to Germany to study with Max Born (physics, 1954) and Paul Ehrenfest 11 . Eventually, 210 near the beginning of WWII, and on winning his Nobel Prize, Fermi moved to the United States 211 and joined the Columbia University community centered on Isaac Rabi. In another subnetwork 212 operating around the same time period (Fig. 2D)

251
It is significant that many of the successful communities identified by this network 281 analysis existed at a time when travel and communication were much more difficult than they are 282 today. Ernest Rutherford (chemistry, 1908) traveled from New Zealand to attend Cambridge as 283 one of the first students admitted from outside the university 17,18 . This occurred in the latter half 284 of the 19th century prior to the invention of the airplane and intercontinental telephone service.

285
At this point in history, physical proximity was critical to the transmission of ideas and expertise.

286
In modern science, however, virtual meetings, video lectures, online courses, and online 287 databases (e.g., PubMed 19 , Google Scholar 20 ) provide remarkably easy access to current, 288 innovative ideas in science. It seems likely that the mentoring patterns among scientists are being 289 radically altered by greater accessibility to information and each other. Still, for many scientists, 290 it is difficult to imagine that virtual proximity could ever be a satisfying replacement for the day- The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/075432 doi: bioRxiv preprint first posted online Sep. 15, 2016; of 57,831 nodes was significantly larger than any of the other components and held the vast 375 majority of Nobel laureates (402 of 472). In fact, in Table S5,  The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/075432 doi: bioRxiv preprint first posted online Sep. 15, 2016; normalized to fall between 0 and 1 with 1 representing the greatest possible diversity for a given 405 number of Nobel laureates. To accomplish this, the C++ program described earlier had options available for 422 generating 1,000 networks with Nobel status randomly assigned to nodes across the network in 423 the same proportion as the true data, each time recomputing outcome measures for each node. As 424 can be seen in Fig. 4, this produced alternate networks with equivalent topology (i.e., the same 425 number of family members and academic structure for each node) but randomly distributed 426 Nobel laureates and thus, random outcomes.  . CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/075432 doi: bioRxiv preprint first posted online Sep. 15, 2016;