Introduction

To ensure the dynamic flow of scientific ideas and expertise, and to promote and facilitate knowledge production, national science systems rely on the international exchange of scholars (Moed et al. 2013; Conchi and Michels 2014; Robinson-Garcia et al. 2019). While the globalization of research has numerous benefits that have been widely acknowledged in the literature (Appelt et al. 2015; Bauder 2015; Franzoni et al. 2015; Netz and Jaksztat 2017), the undeniable downside of academic mobility is the potential loss of talent for countries that train and export more researchers than they receive from other countries. Global competition for talent has led to the introduction of a range of policies and economic incentives aimed at encouraging balanced flows of researchers. However, little attention has been paid to returnees who stay in other countries temporarily, and then return to their country of academic origin. These returnees usually bring with them additional skills, newly established connections and collaborative ties, and complementary expertise acquired abroad (OECD 2008; Appelt et al. 2015). Equally importantly, there is evidence that these returnees tend to receive far more citations than their stationary counterparts (Guthrie et al. 2017), or their internationally mobile counterparts who do not return (Zhao et al. 2021). Thus, for countries that are facing the challenge of the out-migration of researchers exceeding the in-migration of researchers (Zhao et al. 2021), facilitating the return migration of scholars and taking steps to rebalance these trends to their benefit are extremely critical.

Despite being recognized as a science powerhouse, Germany has been sending more highly qualified individuals (including researchers) abroad than it has been receiving according to some reports (OECD 2008; Schiller and Cordes 2016; Zhao et al. 2021). In recent years, Germany has implemented a range of policies and programs designed to attract students and researchers from other countries (Bardin 2016; Eule 2016; Dvell 2019). There are several return migration programs aimed at maintaining and strengthening ties with previously German-affiliated researchers in order to facilitate their re-integration into the German science system (Conchi and Michels 2014). For example, the German Academic International Network (GAIN)Footnote 1 supports current and prospective returnees, and facilitates cooperation between researchers in Germany and North America. The German Scholars Organization (GSO)Footnote 2 is another initiative aimed at reversing Germany’s “brain drain” and turning it into a “brain gain” by offering several services to academic professionals in Germany. Given the practical relevance of this issue to policy development and strategic decisions at a national level, a better understanding of the trajectories and migration trends of researchers formerly affiliated with German institutions and academic returnees to Germany is urgently needed.

Return migration also has potentially large implications for the persistence of gender inequalities in academia (Zippel 2017). The issue of gender disparities in academia has been extensively documented across many disciplines and in most countries (Larivière et al. 2013; Huang et al. 2020; Zhao et al. 2021). Although it has been suggested that breaking the glass ceiling that hinders women’s advancement is especially challenging for internationally mobile academics (Zippel 2017; Zhao et al. 2021), the heterogeneity in the levels of gender disparity among former and current German-affiliated researchers, and particularly among returnees, is not clear. Exploring this topic is the first step towards achieving more balanced gender representation in academia, and ensuring the sustainability of academic careers for women researchers (Weert 2013; Zhao et al. 2021). Due to the implementation of a wide range of measures and policies aimed at promoting gender equality in academia, female representation in various disciplines have been increasing (Macaluso et al. 2016; Zippel 2017; Huang et al. 2020). Given these developments, studying the temporal trends at the intersection of gender and international mobility can have important administrative and policy implications. Moreover, the question of how the representation of female returnees in more recent cohorts has changed in response to these policies and developments deserves more attention. An extensive analysis of return migration among German-affiliated researchers disaggregated by discipline, gender, and cohort is critical for making progress towards transforming the German science system into an inclusive and diverse system with balanced migration flows of scholars.

Previous studies have suggested that academic returnees tend to maintain collaborative ties with their previous host countries (Ackers and Gill 2005; Conchi and Michels 2014; Franzoni et al. 2014; Guthrie et al. 2017). As these ties are critical components of knowledge transfers, it is clear that scholarly migration is no longer a zero-sum game. Returnee researchers may face challenges when attempting to incorporate the knowledge they have acquired abroad into another context, and in re-establishing their career in their academic home country (Melin and Janson 2006; Weert 2013; Fernndez-Zubieta et al. 2015). This may be partly because researchers are disconnected from their home country’s academic networks while abroad, which may limit their access to the information and support they would need to find a job in their academic home country. This erosion of connections may, in turn, reduce the willingness of researchers to return (Ackers and Gill 2005; Baruffaldi and Landoni 2012). Previous work on this topic has shown that there is a gap in macro-level quantitative research on collaboration and migration among scholars. In particular, the interactions between scholars’ collaborative ties with their academic home countries and return migration have not been previously investigated. Therefore, a comprehensive analysis that examines these relationships can provide useful insights into return migration among academics. This can facilitate the development of policies that create additional paths for returnees to re-integrate professionally into their academic home countries.

Motivated by the observations above, this paper relies on large-scale digitized bibliometric data from Scopus (Burnham 2006; Mongeon and Paul-Hus 2016) to investigate the trajectories and migration trends of internationally mobile researchers in Germany as well as their German academic links during the period of being outside of Germany. We analyze German-affiliated published researchers during the 1996–2020 period, while taking each researcher’s years of experience, gender, discipline, and cohort into account. Specifically, this paper aims to address the following research questions (which are both methodological and empirical), with a focus on the return migration of scholars to Germany:

  1. 1.

    What is the composition of returning researchers based on their gender, years of experience, and previous host countries? (Sects. “Age and gender composition of researchers” and “Out-migration and return migration by geography”)

  2. 2.

    How does the gender ratio vary by discipline and cohort among researchers who leave Germany, and among researchers who return? (Sects. “Rates of departure and return across cohorts” and “Gender composition of outward and return streams by discipline and cohort”)

  3. 3.

    How does the association between return migration to Germany and collaboration with German institutions vary across disciplines and cohorts? (Sect. “Collaborative ties with Germany and rates of return”)

Materials and methods

Authorship records of German-affiliated researchers

Scopus is an abstract and citation database of scientific literature (Burnham 2006) that covers over 77 million publications, according to its 2020 coverage guide (Elsevier 2020). From the Scopus database, we have obtained the authorship records (linkages between an author’s affiliation and a publication) of more than 1.1 million researchers for our analysis, which involves over eight million publications. All of these researchers had used a German affiliation address in at least one publication at some point during the 1996–2020 period (ending in April 2020).

Pre-processing of raw bibliometric data

To ensure the reliability of our results, we pre-process the raw bibliometric data from Scopus using three sequential steps. These three steps are: handling the missing countries in the dataset, disambiguating the author profiles, and inferring gender from the authors’ first names (discussed in Sect. “Pre-processing of raw bibliometric data”).

Handling missing countries in the dataset

First, there are 74,430 (5%) authorship records for which the country variable is missing. To handle the missing values systematically, we have developed a neural network algorithm inspired by Miranda-González et al. (2020) that predicts the missing countries with a high degree of accuracy. This supervised learning algorithm takes an affiliation address as the input, and predicts the country as the output. The data used to make the prediction are city, institution, and address. These strings are combined using a bag-of-words method with frequencies (of a given word in a sample) that are normalized (relative to the frequency in the whole dataset) using a term frequency inverse document frequency (tfidf) approach (Tokunaga and Makoto 1994). A simple and standard architecture is used to develop the neural network (deep feed forward neural network) with non-linear activation functions. We use a random set of 1 million authorship records drawn from our dataset that contain country information and split it into training data (80%) and testing data (20%). Other technical details of the development of the neural network have been explained in (Miranda-González et al. 2020). The predictions made based on the test dataset show that the neural network can correctly predict the country for 98.4% of records, which is a level of accuracy we consider acceptable for handling missing country data.

Author name disambiguation

The second step of our data pre-processing helps us overcome the problems associated with using Scopus author IDs to identify unique authors. It has been shown that Scopus author IDs have high levels of precision and completeness (Kawashima and Tomizawa 2015; Paturi and Loktev 2020). Precision measures the percentage of author IDs that are associated with the publications of a single individual only. Completeness measures the percentage of author IDs that are associated with all of the Scopus publications of an individual. The results of an evaluation of the accuracy of Scopus author IDs conducted in August 2020 showed that the precision and the completeness of Scopus author profiles are 98.3% and 90.6%, respectively (Paturi and Loktev 2020). However, while it appears that the quality of individual-level Scopus data is sufficiently high to enable us to study the migration of researchers (Kawashima and Tomizawa 2015; Aman 2018a), there are several notable limitations to keep in mind when using Scopus data for migration research. The precision limits in Scopus author IDs imply that 1.7% of Scopus author IDs may be associated with the publications of more than one person, which could affect the accuracy of the migration events detected by looking at changes in affiliation countries per author ID. Accordingly, as the second step in our data pre-processing, a subset of authorship records that are more likely to have suffered from the precision flaws of Scopus author IDs are analyzed using our conservative author name disambiguation algorithm.

Our author disambiguation algorithm is inspired by the state-of-the-art methods in author name disambiguation (D’Angelo and van Eck 2020). It assumes that every two authorship records are from distinct individuals (despite sharing a Scopus author ID), unless sufficient evidence is found to the contrary using a rule-based scoring approach and a clustering method. We first calculate the similarity score of each pair of authorship records belonging to the same author ID. The similarity is measured based on author names, coauthor names, subjects, funding information, and grant numbers. The author disambiguation algorithm makes all pairwise comparisons between authorship records with the same author ID, and creates a distance matrix based on similarities and dissimilarities in the aforementioned features for each pair of records. A clustering algorithm is then used to process the distance matrices, and to cluster similar authorship records. We then issue revised author IDs based on the resulting clusters. We use the agglomerative clustering algorithm from the scikit-learn Python library (Pedregosa et al. 2011) to cluster authorship records. This algorithm belongs to the family of hierarchical clustering methods. Supporting our conservative approach, it first places each record in its own cluster, and then merges pairs of clusters successively if doing so minimally increases a given linkage distance (Pedregosa et al. 2011). As well as being compatible with our conservative approach, agglomerative clustering has the advantage of offering us the flexibility to process any pairwise distance matrix.

We examine the author profiles that are outliers in terms of the number of affiliation countries or the number of publications. In particular, there are 30,715 (2%) author profiles that are associated with more than six countries of affiliation, or more than 292 publicationsFootnote 3 (an average of more than 1 publication per month across a period of 24 years and 4 months). These author profiles are more likely than others to be affected by the precision flaws of Scopus author IDs. For example, each Scopus author profile could contain records from more than one individual researcher. Based on these criteria, 25,000 author IDs are classified as suspicious. These author IDs are associated with 2,242,797 publications. After disambiguation, revised author IDs are issued for these records according to their clusters, and are then merged with the rest of the data in preparation for the third pre-processing step.

Inferring gender from first names

The last step of our data pre-processing is inferring gender from first names, which involves looking up first names in a large database of names and genders called Genderize (https://www.genderize.io). After performing basic text operations (like removing middle initials from the first name field), we obtained the gender for 1,117,813 author profiles in our dataset. For the remaining profiles, we manually searched for public author information to determine the gender by checking the individuals’ personal homepages, curricula vitae, online profiles, and biographies in publications, as well as other online sources. Using this manual approach, we were able to determine the genders for 3139 additional author profiles. Finally, the most likely gender for 77% of the author profiles in our dataset was determined through either algorithmic or manual gender detection. For our analyses that involve gender (e.g., measuring gender ratios), we set aside the 23% of author profiles whose gender could not be determined either algorithmically or manually.

Migration events, mobility types, and career stages

The international mobility of researchers is determined by identifying the changes in the affiliation addresses of authors across different publications over time. To more reliably detect migration events, the most frequent (mode) country(ies) of affiliation is extracted for each researcher in each year. A migration event is considered to have happened only if the mode country of affiliation changes for the researcher across different years (Subbotin and Aref 2021). Accordingly, the country of academic origin (country of academic destination) is defined as the mode country during the first (last) year of publishing. Based on the individual’s migration events or the lack thereof, each researcher can be assigned to one of the following four categories:

  1. 1.

    Non-movers (with Germany being the researcher’s mode country in all years);

  2. 2.

    Immigrants and transients (origin: not Germany; but with Germany being the researcher’s mode country at some point in time);

  3. 3.

    Outward researchers (origin: Germany; current country: not Germany); and

  4. 4.

    Returnees (origin and current country: Germany; but with another country being the researcher’s mode country at some point in time).

Except for non-movers, researchers may move between the categories over time, as an individual’s status depends on the time period being examined. For example, an individual identified as an outward researcher will become a returnee at the next point in time if a move to Germany is detected.

We define the academic age (age) of a researcher as the number of years since his/her first publication. Furthermore, we classify researchers as early-career (senior) if their academic age is 7 years or less (14 years or more). Researchers with an academic age between 7 and 14 years are categorized as mid-career (Aref et al. 2019). As our dataset covers only the 1996–2020 period, our analysis of some temporal dimensions of the data or cohorts of researchers may suffer from left truncation and/or right censoring. We explain in Sect. “Results” some of the resulting limitations of our dataset.

Inferring disciplines using a topic model

The Science Journal Classification (ASJC) codes in our bibliometric dataset indicate the fields and disciplines (subfields) of publication venues, which could be used as proxies for determining the disciplines of researchers (Zhao et al. 2021). However, because the links between the disciplines associated with journals and the disciplines of authors are indirect, we use a data-driven method to infer the disciplines of individual researchers. Topic modelling, which is a common unsupervised learning approach for natural language processing, can be used to determine the disciplines of researchers by inferring the latent topical structure of textual bibliometric data (Blei 2012; Gerlach et al. 2018).

As a flexible topic model, Latent Dirichlet Allocation (LDA) is in essence a generative probabilistic model with three layers: document, topic, and word (Pritchard et al. 2000; Blei et al. 2003). It assumes that each topic is a mixture of an underlying set of words, and that each document is a mixture of a set of topic probabilities (Blei et al. 2003; Gerlach et al. 2018). LDA has been shown to perform reliably in automatically identifying semantic topic information from large-scale textual data (Dahal et al. 2019).

From the publications authored by each researcher in our dataset, we extract publication titles, journal titles, and keywords to generate the individual’s text corpus (document in LDA terminology). We then tokenize the text by removing all punctuation, and making all of the words lower-case to improve the cohesion of the documents. The remaining words in each text corpus are then lemmatized and stemmed to their root form. This includes being transferred to the first person and the present tense if needed. For some common phases (e.g., machine learning) that are related to discipline topics, we use the multi-word expressions, two-gram collocations, and three-gram collocations from all of the text documents according to their frequency of occurrence. The tokenized and lemmatized texts are then abstracted to a bag of words, which records the indices of words and the number of times each word appears in an author’s LDA document. In LDA, each document can be considered as a mixture of latent topics, each of which is characterized by a distribution of words (Blei et al. 2003). The topic coherence score is a measurement of the semantic similarity between the high scoring words in each topic, and represents the interpretability of the topics.

After the implementation of the steps above, the average topic coherence score of all topics is maximized at 0.67 (through trial and error) when we allow 30 topics for the whole text corpus . Each topic is composed of a set of vocabularies and the corresponding weights that indicate their contributions to the topic. The topic with the highest probability among all of the initial topics is assigned to each author’s LDA document as the intermediate result. We manually interpret the topics based on their most frequent terms and assign titles to them accordingly. When different topics include similar or highly relevant keywords, we combine them into a single discipline. For example, a topic involving “space” and “earth” and a topic involving “galaxy” and “star” are combined to the same discipline: “Earth and Planetary Sciences.” Using this approach, we produce 17 distinguishable disciplinesFootnote 4 to represent the main discipline of each researcher according to their publications. Detailed results on the 30 topics and their mapping to 17 disciplines are provided in the “Appendix”. We consider an author’s LDA document to be “Multidisciplinary” if it does not have any contribution percentage for any topic that exceeds 0.3.

Cohorts leaving Germany and returning to Germany

A cohort is a group of people who have experienced a common event in a selected period, such as birth (Reilly et al. 2005; Rothman 2012). In our analysis, we use the time of first publication as the common event for defining cohorts of researchers. To reduce the impact of left-truncated data on cohorts, which is more likely for the first few years of our dataset, we use the following three cohorts: 1998–2001, 2002–2005, and 2006–2009.

Person-time rate is an index commonly used in epidemiology and demography to express an incidence rate: i.e., the number of incidents (migration events) per person-time in a population during a period (Rothman 2012). The denominator of a person-time rate is the total amount of time that the study members are at risk of a certain incident during a period. One key advantage of using person-time rate for migration is that it enables us to consider that different individuals are exposed to migration events for varying amounts of time. The incidents we are interested in are: (1) leaving Germany for the group of all researchers in Germany, and (2) returning to Germany for the group of all outward researchers. Given a specific period of time t, the departure rate per 1000 person-years for a cohort c is defined in Eq. (1).

$$\begin{aligned} R_{\text {departure}(c,t)}=N_{\text {departure}(c,t)}/ PT_{\text {in Germany}(c,t)} \times 1000. \end{aligned}$$
(1)

In Eq. (1), \(N_{\text {departure}(c,t)}\) represents the number of researchers from cohort c leaving Germany during time period t, and \(PT_{\text {in Germany}(c,t)}\) represents the sum of the number of years each researcher from cohort c stays in Germany (and is exposed to leaving Germany) during period t. The denominator of the departure rate takes all of the researchers who are in Germany into consideration as the population exposed to leaving Germany. Similarly, given a specific period of time t, the return rate per 1000 person-years for a cohort c is defined in Eq. (2).

$$\begin{aligned} R_{\text {return}(c,t)}=N_{\text {return}(c,t)}/ PT_{\text {outside Germany}(c,t)}\times 1000. \end{aligned}$$
(2)

In Eq. (2), \(N_{\text {return}(c,t)}\) is the number of returnees from cohort c during period t, and \(PT_{\text {outside Germany}(c,t)}\) is the sum of the number of years each outward researcher stays outside of Germany (and is exposed to returning to Germany) during period t. The denominator of the return rate only involves researchers who have left Germany as the population exposed to returning to Germany. We compute the departure rates and the return rates separately for male and female researchers in the three cohorts (1998–2001, 2002–2005, and 2006–2009). Specifically, we consider the departure rates of researchers of different cohorts who leave Germany at the academic ages of 1 to 5, and the corresponding return rates during the first 5 years after their departure from Germany. Taking the 1998–2001 cohort as an example, the departure rate at academic age 1 refers to the outward researchers who were “academically born” in 1998 (1999, 2000, 2001) and leave Germany in 1999 (correspondingly, 2000, 2001, 2002). For the same cohort, the return rate refers to the researchers who returned during the 2000–2004 (2001–2005, 2002–2006, 2003–2007) period among the outward researchers who left Germany in 1999 (correspondingly, 2000, 2001, 2002).

Collaborations with Germany while away

We define and use the variable collaborative ratio to distinguish the strength of the academic linkages with Germany for outward researchers during the period when they were away from Germany. For the outward researcher, i, during the period when s/he was away from Germany (denoted by t), we calculate his/her collaborative ratio using a simple fraction: \(CR_{(i)}=D_{(i,t)}/ N_{(i,t)}\). The numerator, \(D_{(i,t)}\), counts the publications of outward researcher i in period t that list a German affiliation for i or for his/her co-authors. The denominator, \(N_{(i,t)}\), is the number of all publications of outward researcher i during period t. If a publication authored by i during period t has at least one author with a German affiliation, it contributes to the collaborative ratio \(CR_{(i)}\). Furthermore, the average collaborative ratio for all researchers in each discipline (cohort) is calculated to measure the average strength of the academic collaboration with Germany maintained by the outward researchers in that discipline (cohort).

Results

Using cleaned and processed bibliometric data associated with over 8 million Scopus-indexed publications over 1996–2020 from more than 1 million German-affiliated researchers, we analyze data on 375,288 female researchers associated with 2,665,139 publications and 745,664 male researchers associated with 6,516,016 publications. Among these researchers, there are 50,803 female mobile researchers (associated with 1,007,606 publications) and 119,298 male mobile researchers (associated with 2,760,282 publications) who have ever migrated between Germany and 194 other countries in our dataset. There are \(103,573 \pm 48,610\) researchers in each discipline, with medicine having the largest number of researchers (199,658) and health professions having the smallest number of researchers (26,398).

Based on pre-processed data, we provide five analyses to describe different aspects of the emigration and the return migration of these researchers. We track their career life courses from a temporal perspective, and their geographic trajectories from a spatial perspective. We then compare the departure rates and the return rates of female and male scholars to explore the gender differences across cohorts and disciplines. Finally, we look at the association between the return rates of outward researchers and the strength of their collaborative ties to Germany. Our inferred migration events dataset is publicly available in a FigShare data repository (Zhao et al. 2022).

Age and gender composition of researchers

We compare the age and gender compositions of three groups of researchers: non-movers, outward researchers, and returnees. Figure 1 compares the age and gender distribution of these three groups using population pyramids, which include individuals who survive as an active researcher up to a certain date (researchers whose latest publication was in 2010 or later). Ignoring the truncated top age, we can see that both female and male non-movers have a notable and pronounced bulge at the transition from early-career ages to middle-career ages, which is presented as an expansive pattern. However, the non-mover age pyramid shows considerably small proportions at other ages, with a pattern characterized by a sharp decline to age 21, followed by a stable increase until age 25+. The median ages for female and male non-movers are 9 and 10 years, respectively. In the categories of outward researchers and returnees, the overall length of academic trajectories has been considerably prolonged for both female and male researchers. Specifically, the median ages of female outward researchers and returnees are 12 and 14, respectively; and the median ages of male outward researchers and returnees are 13 and 16, respectively.

Fig. 1
figure 1

Composition of academic age and gender for non-movers, outward researchers, and returnees. Magnify all figures on the screen for higher resolution and more details

Overall, these findings suggest that both the male and the female returnees stayed in academia longer than both the outward researchers and the non-movers. As more evidence on return migrants emerges, the strengths of returnees are increasingly being seen as valuable. It has, for example, been shown that from a historical perspective, returnees tend to make important contributions to local economies and to be relatively successful, both in comparison to people who never migrated and to people who emigrated but did not return (Abramitzky et al. 2019). The findings on the positive impact of return migration are encouraging, and suggest that Germany, as well as other sending countries, should embrace international mobility and the return migration of scholars. While we find that both the male and the female scholars benefited from returning, we also observe that returning to Germany had a more positive impact on the careers of male than of female researchers. For example, the results show that 64.57% of male returnees, but only 51.13% of female returnees, had become senior professionals (see detailed age composition in Fig. 1c). The smaller benefits found for women are not surprising, and point to the ongoing challenges women in academia face.

Out-migration and return migration by geography

Figure 2 illustrates from a geographic perspective the interplay between outflows of researchers from Germany and the corresponding rates of return to Germany, through a density equalizing cartogram (Dorling et al. 2006-09; Houle et al. 2009). Here, the shape of the map polygons is transformed proportionally to the outflows to different countries. The colors represent the differences in the countries’ return rates, as further explained in the legend. The most common host country for researchers from Germany was the United States (US), which received around 24% of the outward researchers from Germany. Next came Switzerland and the United Kingdom (UK), which together attracted 22% of the outward researchers from Germany. In total, these three countries received nearly half of the outward researchers from Germany, and had thus become the most appealing options for German researchers interested in pursuing an international academic career. These estimates are also consistent with previous findings that the US, the UK, and Switzerland are the most common origin and destination countries for scholarly migration to and from Germany (OECD 2015; Zhao et al. 2021). The observed pattern for the European countries that received researchers from Germany indicates that the countries that neighbor Germany and German-speaking countries were among the most popular host countries for scholars who began their publishing activity in Germany.

Fig. 2
figure 2

Outward flows (from Germany) and respective return rates across countries. The sizes of the countries are proportional to the flows of outward researchers from Germany. The colors indicate the differences in the return rates of the German-affiliated researchers returning to Germany from each country. The colored version of the figure is available online in high resolution

As the colors on the map show, the rates of return from the most common receiving countries that are larger in size were all below 36%; meaning that about one-third of German-affiliated researchers moved back to Germany, while nearly two-thirds continued their research abroad. While the US hosted the largest share of researchers from Germany, the rate of return to Germany from the US was also relatively high, at 34%. Similarly, while the UK and France were among the top host countries for researchers from Germany, the rates of return to Germany from these countries were also high, at 30% and 29%, respectively. By contrast, the rates of return to Germany were below one-quarter for German-affiliated researchers in Switzerland, Sweden, Austria, and Australia; and the return rate was especially low for German researchers in Switzerland, at only 20%. It thus appears that researchers who moved from Germany to these four countries were comparatively less likely to return. The lower propensity to return may be partly explained by the higher spending on Research & Development (R&D) in these countries. In 2017, Switzerland, Sweden, Austria, and Australia spent about 3.18%, 3.36%, 3.00%, and 3.08% of their GDP, respectively, on R&D far above the OECD average of 2.67%, ahead of the US (2.85%), the UK (1.68%), and on levels competing with Germany (3.07%) (OECD 2021). In addition, the lower return rate of German researchers in Switzerland is broadly consistent with our expectations, given that approximately 1.2% of all scientific papers worldwide are produced by Swiss-affiliated researchers, which is remarkable given the country’s small population (Turney 2019).

Rates of departure and return across cohorts

Figure 3 illustrates the departure rates (left) and the return rates (right) per 1000 person-years, disaggregated by cohort and gender. The academic age at departure is on the y-axis for both outward researchers and returnees. For returnees, the length of time away from Germany is also reported by the use of ombre colors. Taking the cohort 1998–2001 as an example, out of 1000 researchers, around eight women and nine men with a German academic origin in this cohort moved abroad at academic age one. For every 1000 outward researchers who left Germany at academic age 1, around 215 women and 278 men had returned to Germany within 5 years. Among them, 74 women and 88 men had returned to Germany after 1 year, making the first year the most likely year of return for that cohort. In general, there was a slight but stable decline in the departure rates with academic age for all three cohorts. However, the most striking pattern is observed for the 2006–2009 cohort: the departure rates of female researchers exceeded those of male researchers for most ages, especially at academic ages 1, 2, and 3. Specifically, we find that 11 out of 1000 female researchers in this cohort left Germany at academic age 1, while only 9 out of 1000 male researchers left Germany at that age. This result indicates that in this cohort, more female than male researchers chose to migrate early in their careers. Meanwhile, the return rates of the female researchers of all three cohorts were much lower than those of their male counterparts. This difference may be partly related to the longer average length of academic life for male returnees, as Fig. 1 shows. Taken together, these results indicate that female outward researchers had a greater tendency than their male counterparts to remain abroad for longer periods or possibly to settle down in other countries, which may have exacerbated the gender disparities in the German science system. Thus, the findings suggest that out-migration trends may increase gender disparities within the German academic system unless further action is taken.

We also observe that the return rates were generally higher for researchers who moved out of Germany in their later years, and tended to increase with academic age. This trend is more noticeable among male researchers and in the two latest cohorts. The more pronounced increase in return migration at later academic ages for men than for women suggests that there are structural processes that operate at specific moments of the academic life course, and that these processes could further extend the gender differences in German academia.

Fig. 3
figure 3

The rates of leaving Germany within first 5 years since first publication per 1000 person-years (left), and the rates of return to Germany within the first 5 years after departure per 1000 person-years (right). The colored version of the figure is available online in high resolution

Gender composition of outward and return streams by discipline and cohort

Considering that the male-to-female ratios of researchers vary across disciplines (Zhao et al. 2021), we take a further look at the gender disparities disaggregated by discipline for the three cohorts, as shown in Fig. 4. The colors in the heat map show that the representation of female researchers varies by discipline in the horizontal dimension, and by cohort in the vertical dimension. The bottom row of the map represents the overall proportion of female researchers in each discipline in Germany over the 1996–2020 period, as a baseline for comparing the variability in the representation of females among the researchers who left and returned over time.

Fig. 4
figure 4

Proportion of female researchers in different groups by discipline and cohort. The colored version of the figure is available online in high resolution

Compared to the baseline, almost all disciplines appear to be more male-dominated over time among both outward and returnee researchers, albeit to varying degrees. One exception is the field of mathematics (12), in which female researchers accounted for a higher proportion of each of these two migrant categories in the latest cohort (2006–2009), relative to the long-term pattern. Despite the lower representation of female researchers in both the outward and the returnee groups, for the majority of the disciplines, we see an increasing trend in the proportion of female researchers with each successive cohort, in line with our earlier discussion in Sect. “Rates of departure and return across cohorts”.

When comparing the categories of outward researchers and returnees in the same discipline and cohort, we observe that the proportion of female returnees was generally smaller than the proportion of female outward researchers. For example, when we look at the latest cohort of researchers in the field of energy, we find that the female proportion among returnees was much smaller than the female proportion among outward researchers. The overall impression provided by these data is that most disciplines are experiencing rising gender disparities, in part because female scientists who leave Germany are less likely than their male counterparts to return. Despite substantial efforts to increase gender equality in academia, gender disparities seem to remain substantial across disciplines.

Collaborative ties with Germany and rates of return

In this section, we examine the association between the levels of academic collaboration with Germany researchers maintained while abroad and the corresponding rates of return to Germany. Figure 5 shows a scatter plot of the return rates (y-axis) and the average collaborative ratios (x-axis) for each discipline. Note that the collaborative ratio is the fraction of publications of an outward researcher (during the period outside of Germany) with a German affiliation. The horizontal (vertical) line indicates the overall average return rates (average collaborative ratio) for outward researchers across all disciplines. The number of returnees for each discipline is represented by the size of the circles. Overall, the Pearson correlation coefficient between the collaborative ratio and the return rates is 0.45, indicating a moderate positive association. Researchers in most health science and life science disciplines, including medicine, health professions, and psychology, were more likely to return to Germany than researchers in other disciplines, as indicated by the higher return rate over the average rate. When we look at the returnees’ levels of academic collaboration with Germany while abroad, we see that health science returnees, as well as researchers in some physical science disciplines, like earth and planetary science, were more likely to maintain academic ties with Germany, as shown by collaborative ratios that exceed the mean values of \(33\%\). Specifically, we observe that health science researchers maintained stronger collaborative ties with Germany, and were more likely to return; whereas researchers in STEM fields, who tended to leave Germany without maintaining as many collaborative ties, were less likely to return.

Next, we look at the association between collaborative ties and return rates among outward researchers by cohort. The results disaggregated by cohort are shown in Fig. 6, with the average return rates and collaborative ratios in each cohort represented by the horizontal lines and the vertical lines, respectively. Our results show an overall decreasing trend in rates of return by cohort, but the left-truncation of the data complicates the reliable investigation of trends involving the first cohort. Despite the general trend, researchers in health professions and medicine were more likely to return than researchers in other disciplines. The collaborative ratios grew slowly but steadily with each cohort; thus, researchers in the latest cohort maintained relatively strong collaborative ties to Germany. Similar patterns can be observed separately for most disciplines.

The correlations found between the collaborative ratios and return rates in the first two cohorts are in line with the overall pattern shown in Fig. 5, with Pearson correlation coefficients of 0.43 and 0.41, respectively. This association becomes much weaker (the correlation coefficient was 0.29) in the latest cohort, whose discipline averages appear to be scattered widely across the four quadrants. Between cohorts 2 and 3, neuroscience drops from quadrant 1 to quadrant 4, indicating a sharp decrease in return rates, despite an increase in academic links with Germany. Between cohorts 2 and 3, we see an increase in collaborative ratios among the outward researchers in the fields of chemistry and chemical engineering, accompanied by stable return rates. For most other disciplines, however, the return rates tended to decrease, as shown in Fig. 6.

Fig. 5
figure 5

Return rates and collaborative ratios across disciplines. The colored version of the figure is available online in high resolution

Fig. 6
figure 6

Return rates and collaborative ratios by discipline and cohort. The colored version of the figure is available online in high resolution

Discussion and future directions

As “science brokers,” researchers develop innovative ideas and make scientific contributions by combining information and resources in various domains using specialized skills and knowledge, which they acquire at different institutions and geographical locations (Williams 2007). International experience can play a substantial role in helping researchers accumulate knowledge, information, and capital, and can thus contribute to their scientific research and academic careers (Teichler 2015; Wang 2020). Our previous study found that internationally mobile researchers accounted for over 16% of the population of Scopus-published researchers who had affiliation ties to Germany over the 1996–2020 period (Zhao et al. 2021). We also observed that despite representing a minority in the German science system, mobile researchers make substantial contributions, as evidenced by the finding that compared to non-mover researchers in Germany, they have higher annual citation rates (Zhao et al. 2021). Because of their more nuanced trajectories and international experience, returnees can make important contributions to the German science system. Here, we have analyzed the return migration of researchers to Germany from several perspectives; i.e., by taking into account their disciplines, cohorts, genders, and levels of collaboration with Germany while abroad.

Our quantitative results for Scopus-published researchers with ties to Germany provide further evidence to support previous findings. The results of our comprehensive analysis of emigration and return migration as two outcomes for researchers who left the German science system indicate that the age and gender compositions of outward researchers and returnee researchers differed from those of non-movers. The median age for returnee researchers was up to 6 years higher than that of non-movers, which suggests that there were substantial differences in their levels of experience. All three groups of researchers differentiated by their levels of experience, from early-career to senior, were heavily dominated by men. The ongoing gender disparities we found throughout the academic life cycle were in line with the findings of previous studies (Vásárhelyi et al. 2021). In particular, we observed that the publishing careers of male returnees were, on average, longer than those of other groups, with more than half of them being in their senior career stage.

The countries receiving the largest flows of researchers from Germany were shown to have some of the highest return rates as well. However, we also found that of the large numbers of German researchers who moved to Switzerland, Sweden, Austria, and Australia, relatively small proportions returned to Germany. Three of these countries have linguistic, cultural, and geographic proximity to Germany. Moreover, they all have higher R&D spending per GDP (OECD 2021) than the UK and the US (and three have higher R&D spending than Germany), which has enabled them to succeed in attracting and retaining published researchers from Germany.

Supporting the representation of female researchers in academia through equitable policies is imperative for Germany (Lutter and Schröder 2020), and for other countries (Morgan et al. 2021). The trajectories of internationally mobile female researchers is a particularly important dimension in evaluating a national science system. We analyzed the gender differences among outward researchers and returnees. Our results indicate that the gender disparities in the German science system tend to be intensified over cohorts. Consistent with evidence showing that the representation of female researchers in academia has risen over time (Huang et al. 2020), we found that the proportion of female researchers has increased among both outward researchers and returnees across cohorts, taking into account the number of years between first publication, departure from Germany, and return to Germany. However, the proportion of female researchers among those who returned to Germany was lower than it was among those who left, which indicates that female outward researchers have a greater tendency than their male counterparts to live abroad for longer periods, or possibly to settle down in other countries. When we looked at the proportions of female researchers in the two subpopulations of interest disaggregated by cohort and discipline, we found that both the outward and the returnee subpopulations in most disciplines were more male-dominated than the overall population of researchers in that discipline, in line with the greater gender disparities observed among all German-affiliated migrant researchers in most disciplines (Zhao et al. 2021). These findings suggest that the gender imbalance in the German science system (with respect to scholars who started publishing in Germany) may be intensified by the subgroups who are returning to Germany being more male-dominated than the subgroups who are leaving Germany.

Finally, we looked at the interplay between the degree to which researchers continued to collaborate with German institutions while abroad, and their corresponding return rates. The results showed a positive moderate association between collaboration and return rates across disciplines. After cohorts were introduced into the analysis, the return rates decreased with successive cohorts, while the collaborative ratios increased on average. In the fields of medicine, health professions, physics, and psychology, the likelihood of collaborating with Germany and of returning to Germany were both higher than the total averages. In contrast, researchers in the fields of engineering, computer science, and economics had both lower collaboration and lower return rates that the total average. To tackle the challenge of talent loss in STEM fields, and to attract and retain STEM researchers from abroad, Germany—which already has a large number of initiatives for international researchers, like GAIN and GSO—would likely benefit from developing additional programs focused on STEM fields (OECD 2015).

Our study has several limitations, which can be addressed only through ongoing work and additional efforts. Our bibliometric analysis was based on the higher quality signals for researchers who have higher publication rates. Therefore, the reliability of our findings may not be the same for all fields, given that their average publication rates vary (e.g., physics vs. history). Another limitation is that we could not analyze migration events that were not captured in the publication data. In addition, because of the possible differences between publication years and migration years, the temporal patterns of the data should be interpreted with caution.

We recognize that bibliometric data, like other sources of big data, are not produced for use as research data, and are therefore susceptible to potential biases or errors. In our materials and methods, we outlined a series of pre-processing steps for systematically dealing with some of the data quality issues in our application context. Additional scientometrics research is needed to better identify the potential quality problems with bibliometric data, and to find systematic and effective remedies for addressing them.

As well as contributing to the literature on the migration of researchers (Moed and Halevi 2014; Aman 2018a, b; Aref et al. 2019; Andrey and Elena 2019; Robinson-Garcia et al. 2019; Miranda-González et al. 2020; Subbotin and Aref 2021; El-Ouahi et al. 2021) in the context of Germany (Netz and Jaksztat 2014; Parey et al. 2017; Zhao et al. 2021), more importantly, our research fills a critical gap in the research on the return migration of scholars, which is a novel subject in the bibliometric analysis of academic migration. This work, which represents a continuation of Zhao et al. (2021), was aimed to provide a policy-relevant descriptive analysis of return migration among researchers by taking their levels of experience, gender, disciplines, and cohorts into account. Obtaining insights into researchers who have left Germany, including into their age, gender, and characteristics that could influence their potential return to Germany, is a key step towards understanding migration among scholars as a concept that is more nuanced than a one-off relocation event.

A number of interesting questions still remain to be investigated, including the question of what personal and professional factors drive the international migration of researchers. Differences in levels of support for parenthood between Germany (Gangl and Ziefle 2009; Lutter and Schröder 2020) and other countries (Morgan et al. 2021) may have a bearing on some of the observed gender disparities. Combining different data sources could allow us to expand the analysis and examine other critical topics, like parenthood policies. Investigating the citation performance of outward and returnee researchers could provide us with additional insights into the individual-level consequences of scholars’ migration decisions. In addition, the observed association between return migration and personal and professional factors, including disciplines and collaborative ties, can be further investigated with the aim of finding the mechanisms involved, such as the emergence of discipline-specific centers that are particularly attractive for migrant researchers.