1 Introduction

In life science research on humans, it is common to divide humans into groups using categories that carry more than mere descriptive and systematizing meanings. The use of terms such as “race”, “ethnicity”, “ancestry”, and “migration background"Footnote 1 can have a variety of scientific and social consequences. For this reason, human categorizations in the life sciences are highly controversial and subject to ongoing criticism—from within and outside the life sciences. First of all, categories such as race and ethnicity have often been criticized for not adequately reflecting biological diversity [1,2,3,4,5]. Second, classifications such as race, ethnicity, etc. inextricably contain social meanings and reflect their socio-cultural conditions of origin and are thus questionable as biological categories [6,7,8,9,10,11,12]. Third, most classification terms are loaded with historical injustice, which can also make their unquestioned use in current research problematic [5, 13].

The German-language research landscape is of particular interest here, because of the history of German scientific racism and the persecution and extermination policies pursued under National Socialism [14]. One effect of this history is that the term “race” (and the equivalent German term “Rasse”) is no longer commonly used. In contrast to usage in the UK and especially the US, it is almost exclusively associated with a biological meaning and is primarily associated with National Socialism and colonialism [15,16,17]. For this reason, many key political and scientific actors have rejected the concept, and Germany is often considered to have a special responsibility to reject the discourse of race altogether. Against this background, the aim of this article is to provide an insight into the spectrum of classifications used in life science research at German scientific institutions. The research questions pursued in this analysis are: first, what terms are used to categorize humans in scientific papers published at German research institutions? Second, what differences in designation practices exist among disciplines? And third, can features be identified that are associated with studies carried out in a specifically German research context versus those which are more internationally positioned? The current analysis is part of a larger study about human classifications in the life sciences in the German context, which aims to map the practices of categorization of human diversity in contemporary life science disciplines.

This reconstruction of the classification practices of different disciplines contributes to ongoing debates and research about the social repercussions of differentiating human groups in science as well as—vice versa—the effects of biological classifications on society. This international debate focuses on issues relating to the danger of racialization, essentialization, and stereotyping. This kind of critical research also takes into account the construction of categories; the political, social, and historical backgrounds of classification systems; and the political and academic practices of nation-, culture- and time-specific attempts at classification [7, 18, 19]. Categories and classification systems should thus not be understood as a reflection of biological differences, but as social and above all political distinctions. And they do not simply represent perceived inequality, but are also attributive and reality-producing [20].

As things stand presently, editors from a variety of life science journals have sought to address these issues by publishing author guidelines for the responsible and non-harmful use of human classifications (see for example [21, 22]). However, the publication of guidelines has not necessarily led to a higher quality of reporting race/ethnicity data in the past few decades, as some reviews in recent years have shown [23,24,25,26].

This article begins by providing a brief overview of the situation regarding human classifications in Germany and specifically the question of “race”. This is followed by a presentation of our methodology, our findings, and, finally, a discussion of key aspects.

2 The apparent absence of race in Germany

There are two main reasons for the limited usage of the term “race”/“Rasse” in current writing. First, its association with the National Socialist “race laws” and the persecution of Jews in the Third Reich; the term “Rasse” itself is therefore discredited, and this resulted, second, in a tendency from the 1980s onwards to avoid the term in both political and scientific discourse, and to use alternative concepts for human differentiation such as “ethnicity”, “ancestry”, or, most recently “migration background”. This is true not only of the life sciences, and there have been extensive efforts to remove the term “race” from official documents and legal texts, such as the German constitution and state constitutions [27].

However, neither the term “race” itself, nor the practice of racialization by means of alternative terms have been extinguished. This is adequately demonstrated by the ongoing production of statements against biological race, such as the recent Jena Declaration, which can be interpreted as signifying the persistence rather than the end of the concept. This declaration was published in 2019 by leading German zoologists and human geneticists on the occasion of the annual meeting of the German Zoological Society, and its authors demanded that “[t]oday and in the future, not using the term race should be part of scientific decency” [28]; the aim was to overcome biological justifications for racial discrimination. Evidently, statements like these continue to be necessary, as the complex relationship between the life sciences and race is far from settled. In addition, several studies have shown that the concept and its derivates continue to exist in a range of life science fields in Germany, including biology [17, 29,30,31], medicine [16, 32], genetics [33,34,35], and psychology [36], as well as in life science research in other European countries [13, 37,38,39]. The majority of these studies provide rather cursory or anecdotal insights into the various disciplines and practices of human classification in the life sciences in Germany. Up to now, no comprehensive empirical assessment of the use of human classifications in the German life sciences has been undertaken.

3 Methods

3.1 Systematic literature review

We performed a systematic literature search using PubMed and Web of Science (WoS) with the aim of identifying German life sciences studies that applied race and other possibly racializing classifications to human research subjects. In order to study the output of the German scientific community, we focused on publications by research institutions located in Germany. Life sciences was liberally interpreted as the study of human life using scientific methods—therefore encompassing all medical and biological disciplines as well as psychology.

In terms of method, we followed the PRISMA-P 2020 guidelines for systematic reviews and meta-analyses (see Fig. 1 and checklist in supplementary material 1) [40]. In our preliminary explorative research, we had identified “race”, “ethnicity”, “migration background” and “ancestry” and their cognates as the terms most commonly used in the life sciences to represent human diversity, and thus chose these words as search terms. The commonly used term “population” was not included in our analysis because the search string led to a large number of articles that were not relevant to our research question (e.g. population-wide studies) and that could only be processed with a disproportionate amount of time.

Fig. 1
figure 1

Adapted from Page et al. [40]

Flow chart for the selection of studies for the systematic literature review.

We limited our review to primary research (quantitative and qualitative) in English and German and excluded meta-analysis and case reports, but included studies where the authors used existing datasets to address their own research questions. We thus focused on studies where the authors themselves designed study and/or analysis protocols and potentially chose a system and terms to describe and/or differentiate research subjects. In order to narrow our scope to German life science, we included only studies where at least the first or last author was affiliated to a research institution located in Germany. To focus only on current studies, the search was limited to articles published between 2018 and 2020. The chosen timeframe opens up the possibility of assessing the potential influence of the Black Lives Matter movement on the choice of classification terms in 2020; one scientist we interviewed for a subsequent qualitative study indeed raised the topic independently. This crucial question will be the focus of future investigation. The last search was performed on April 13, 2021, to consider possible lag time from publication date to articles being indexed in scientific databases. The laborious coding process we used goes beyond an ordinary systematic review (see below) and was finished in early 2022. Using the search term “((race) OR (migra*) OR (ethnic*) OR (ancestry)) AND (German*)”, we identified 3982 records in the Pubmed database published within the selected time frame. For the WoS database, the search term “race OR migra* OR ethnic* OR ancestry” was used, and the resulting search results refined to studies from Germany and from life science disciplines. 6235 records were identified in this way. From the total of 10,217 records from both databases, 9415 records could be excluded from further analysis because they were duplicates, the record did not meet our search criteria, or the article did not fit the scope of our review (because no human subjects were studied or artificial data was used; because the article did not report primary research results or addressed only single case reports; or because the subjects were not categorized or described using human diversity classifications). Application of these criteria returned 802 relevant publications that were then read thoroughly. Another 256 publications were excluded because the full text of the article was not available online, via library access, or after an email request was made to the authors; because the studies turned out not to be from the life sciences after all; or if any of the previously listed exclusion criteria were met (Fig. 1). The final article database contained 546 articles matching the scope of our review, 46 of them were in German (8.4%) and the rest in English.

3.2 Quantitative analysis

To perform a quantitative content analysis [41], we used MAXQDA Standard 2020 20.1.0 (VERBI GmbH Berlin) software. We developed a coding system comprising the following categories: authors (sub-codes: all authors affiliated with German institutes; first and last author so affiliated; only first or last author so affiliated), discipline (by research institutions of first and last author), categories (which terms were used to classify or define groups of subjects), and study location (country where the samples or data were collected).

With regard to the classification terms, we focused both on terms used in the methods sections to describe or stratify research subjects and on coded terms used by authors in the interpretation of results in the discussion section to describe the cohort they had previously analyzed. By using this approach, i.e., by explicitly restricting our analysis to the terms used by the authors of each study to describe the respective cohort, we look at the terms in context beyond a mere counting of terms. Where studies did not name the location where research subjects were examined or sampled, we coded them either as “not specified” when the information was not available without guessing or as “information provided elsewhere” when the authors referred to other research articles.

Focusing on the methods, results, and discussion sections, members of our research group read and coded all the articles thoroughly. When uncertainties arose, the publications and codes in question were discussed within our group until a consensus was achieved.

The absolute and relative frequencies of coded studies were calculated for each sub-code.

We created seven generalized linear models with binomial error distribution using one of the seven most common terms (“ethnicity”, “migration”, “ancestry”, “race”, “population”, “origin” and “refugee/asylum seeker”) as the response variable in each case. Predictors were authors, sampling location, their interaction, and the three most common disciplines (medicine, epidemiology, and psychology). We performed model comparison with all possible parameter combinations and considered all predictors appearing in models with a relative model weight > / = 0.05. We built the final models using only those predictors, and calculated their adjusted R2 values. All analyses were conducted using R [42]; models were created with the package “lme4” [43]; model comparisons were performed using the “MuMIn” package [44]; and adj. R2 were calculated with the package “rsq” [45].

4 Results

4.1 Great diversity of classification terms used in the German life sciences

With our broad search strategy, we were able to build a dataset that gives a substantial overview of the current research landscape in the relevant fields. The 546 studies identified included a great variety of terms employed to stratify or describe research subjects. Within our dataset, we registered 136 different classifications used. Derivates were coded under umbrella terms, e.g., “immigrant”, “migrant”, “migration background”, and all variations and German equivalents of these terms where subsumed under the term “migration” (see supplementary material 2). After merging related terms in this way, 34 different classifications could be identified within our database. Those that were applied in more than 2% of the studies analyzed and the frequency at which they were used are listed in Table 1. The majority of studies (54.8%) applied several different terms to describe a cohort, e.g., stating that people have a certain “ethnicity” or a “migration background”, often using the terms interchangeably.

Table 1 Human classifications used in life science studies from Germany

Of the original search terms in our systematic literature review, the classification “ethnicity” was used most often to describe the cohort studied (45.5%), followed by “migration” (33.6%), “ancestry” (17.7%), and “race” (13.9%). We identified additional terms that were used in a large proportion of studies within our database. Of these, the most common were “population” (27.8%), “origin” (14.6%), and “refugee” or “asylum seeker” (10.6%).

4.2 German influence on classifications

We recorded whether all authors (238 studies) or the first and last author (119) or the first or last author (189) were affiliated with a German research institution. The terms used varied noticeably with the degree of “German-basedness” of the author teams (Fig. 2) as well as where the study was conducted (Fig. 3). Most notably, the term “race” was not very commonly applied in studies by all-German author teams or teams with a German first and last author (4.6% and 9.2%), but used in nearly 30% of studies published by only a German first or last author. This divide was also noticeable for the terms “ethnicity” and “ancestry”. As described above, “ethnicity” was overall a more popular term than “race” within our database, and was often used alongside other terms. But while it was used in 28.6% of studies of all-German author teams, around 60% of studies by author teams less affiliated to German research institutions used it. “Ancestry” was used in only 6.3% of studies by all-German author teams, while research subjects were classified using this term in 31.1% and 23.8% of studies by a German first and last author or German first or last author, respectively. In contrast, the majority of all-German author teams used the term “migration” (57.1%), while this term was used much less often in studies by “less German” author teams (22.7% and 11.1%) within our database. All the above descriptive results are backed up by our quantitative analyses, as the author institution was a significant predictor in our models of the use of “ethnicity”, “race”, “ancestry”, and “migration” (see supplementary material 3).

Fig. 2
figure 2

Frequency of classification terms used in relation to affiliation of author teams to German institutions

Fig. 3
figure 3

Frequency of classification terms used in relation to sampling location. Some studies examined research subjects in several countries, thus there is an overlap between articles categorized as “German”, “USA” and “other”

In German-language publications from all-German author teams “Rasse” (race) was never used (only one German-language article had authors who were not all affiliated to German research institutions). In German articles, the term “Migration” (migration) was even more common when compared to English-language publications from all German author teams (82.2% vs 52.6%) and “Ethnizität” (ethnicity) was used less often (8.9% vs. 33.3%).

The articles analyzed human samples collected in 62 different countries. Most of them studied research subjects in Germany or the USA (50.6% and 11.2% of articles, respectively). In 35 studies (6.4%), the authors did not make sufficiently clear in which country their research subjects were examined or sampled, either by using extremely broad terms like “Europe” or “Africa” or by not mentioning a location at all.

While study authors who recruited research subjects in Germany most often described and stratified them as “migrants” or people with “migration background” (58.5%), in studies with sampling in the USA, only 6.6% of studies used such a category. In contrast, 63.9% of datasets collected in the USA used the term “race” while only 3.3% of studies performed in Germany used “race” to describe research subjects (Fig. 3). Our quantitative analyses confirm this assessment, as the sampling location was a significant predictor of the use of “race” (see supplementary material 3), as it was for nearly every term. The exceptions are “population” and “ancestry”, where use is similar for samples from Germany and from the USA, but more common if the sampling location was “other”.

4.3 Disciplinary differences in the use of classifications

According to our dataset, research from different life science disciplines applied human classifications at varying frequencies to describe and differentiate research subjects (Fig. 4). Including all medical disciplines as a single category, we coded 42 different life science disciplines by the first and last author’s institutional or departmental affiliation. Most of the articles analyzed were written by authors from the field of medicine (74.0%), followed by epidemiology (24.7%), and psychology (14.5%), with many of them published by scientists from more than one discipline (44.9%). While nearly half of the author teams working in the field of medicine sorted their research subjects by their “ethnicity” (48.3%), in the epidemiology and psychology papers we analyzed, “migration” was the most frequently used term (58.5 and 44.3%, respectively). However, neither of these terms was used frequently in the field of archaeogenetics, where “ancestry” or “population” were most often applied when grouping human research subjects (90% and 80%). Our quantitative analyses confirm these observations (see supplementary material 3).

Fig. 4
figure 4

Frequency of classification terms used in different research areas. Many of the studies analyzed were assigned to several disciplines, thus there is an overlap between research areas

5 Discussion

Our quantitative analysis of 546 studies published between 2018 and 2020 by authors affiliated to German institutions confirms the results of smaller or more specialized meta-analyses that have identified the wide range and vagueness, underdetermination, and inadequacy of human classifications used in the life sciences. At the same time, our results also substantiate the expected specificity of the context of scientific research performed in Germany, where the use of the term “race” is relatively low.

5.1 Variety and inconsistencies

In their literature review, Zhang and Finkelstein examined the different racial and ethnic categories used in pharmacogenetic research [46]. They concluded that there is a high degree of heterogeneity in the categories that are used to investigate the distributions of certain genotypes. In the field of epidemiology, Bokor-Billmann et al. analyzed articles in top-ranking journals for their use of terms relating to ethnicity and race and identified 81 different methods authors used to classify such categories [23]. A recent review of studies in the field of ophthalmology concluded that “the categories used were heterogeneous and often inconsistent” [47]. Our analysis confirms these previous observations: research subjects were classified using 146 different terms, and even after we had grouped them, 34 different classifications remained. Interestingly, many authors in our database used several terms next to each other (see supplementary material 3), sometimes even as proxies or synonyms. For example, Monsees et al. studied the prevalence of dementia in people with “migration background”. But in the caption of a table summarizing the results “Ethnien” (ethnicity) is used and these ethnicities are then differentiated by “Nationalität” (nationality) [48]. While a comprehensive analysis of how the studies in our database defined and applied these classifications is still in progress, this finding points to a lack of awareness that terms such as “ethnicity”, “migration background”, and “race” draw on different societal and scientific concepts and discourses.

One factor responsible for the observed divergence between studies is that while English is the standard language of science, different national and cultural contexts influence classifications [20]. For example, Zhang and Finkelstein noted a high degree of heterogeneity in the number of ethnic categories used in different countries to classify research subjects as “Asian” or “White” [46]. As anticipated, we observed a noticeable association between the degree of ‘German-basedness’ of author teams and which terms they used.

5.2 No race in German science?

At first glance, our analysis may give the impression that the German research landscape is post-racial. This is because, first, the German term “Rasse” appears in none of the 46 research publications written in German and, second, the term “race” is far more rarely used by author teams where all members are affiliated to German research institutions compared to more international research teams (4.6 vs. 28.6%). Third, categories of race were also very rarely applied to research subjects that were sampled or examined in Germany, compared to other study locations.

However, aside from the explicit use of the term “race” in the method section to classify research subjects, we often observed an additional, less precise use of this term, including with reference to the cohort being investigated: race is still used by many authors in the discussion section of publications when the author’s own results are compared with the findings of other studies. Kridin et al., for example, investigated specific health differences between people of Jewish and Arab ethnicity in Israel. In the discussion section, the authors contrasted their results for “patients of Jewish ancestry” with a study on people of “African race”, implying that the categories used in these two studies are comparable [49]. This kind of switching and equating of human classifications in the discussion of results is not currently covered by the usual journal guidelines. For example, the recommendations of the International Committee of Journal Editors [50] only recommend defining how race or ethnicity is determined and using precise language with reference to the participants of a given study.

Instead of race and ethnicity, author teams affiliated to German research institutions often use the term “migration” (including migrant, immigrant, migrant background, etc.) to describe and stratify their research subjects. This observation is especially true of German-language publications. Further investigation is ongoing to establish whether “migration” is simply used as a synonym to replace socially controversial terms, or whether the use of the language of race and ethnicity is low because German scientists seldom see any scientific or ethical benefits in using it. Preliminary results of our analysis for the field of epidemiology show that, on the one hand, the choice of categories might also relate to the terminology used in the political sphere. Germany, unlike the US, has no ethnic census categories, but recently introduced the variable “migration background” in its annual microcensus to record first, second, or third generation immigrants to Germany that may have German citizenship and are no longer trackable using the former category “Ausländer” (foreigner) [16, 51]. German epidemiological studies have since begun to apply this category instead of race or ethnicity, when samples are collected in Germany, arguing, e.g., that Germany is “an immigration country without post-colonial migration and without numerically relevant autochthonous ethnic minorities” [52]. On the other hand, migration background is often used interchangeably with ethnicity, maybe to ensure compatibility with the international discussion [51].

Finally, we note that the terms “refugee” and “origin” are used more frequently when more German authors are involved in the study. The use of the former possibly reflects the research interest in the situation of refugees after the increase in public discourse on immigration starting in 2015. The predilection for the English term “origin”, which is rarely used by native speakers, could be understood as an expression of linguistic uncertainty. One might even suspect that the somewhat indeterminate nature of the term encourages its deliberate use.

5.3 Disciplinary classification cultures

Our literature review yielded studies from a broad variety of fields in the life sciences, including clinical studies of pharmaceutical and biotech companies. Nearly half of them were published by authors from different research fields, speaking to a relatively high degree of interdisciplinarity in the German research landscape on human diversity. However, nearly three-quarters of studies were affiliated to medical departments. Whether this dominance of medical studies is due to a general dominance of medicine in German life science research (due to research funding allocation and publication cultures), or whether medical studies are more likely to apply human classifications, is unclear. Due to the low number of studies from disciplines besides medicine, epidemiology, and psychology, our findings should be used with caution. Nonetheless, these three disciplines show a rather similar frequency distribution of the classification terms ethnicity, ancestry, race, and migration, we observed differences in other disciplines. Studies from archaeogenetics and human genetics, for example, mainly refer to “ancestry” to describe or divide their research subjects. Here, “ancestry” is usually genetically determined, and is sometimes even used to verify the self-reported ethnicity of research subjects [53]. This is in accordance with observations by Fujimura and Rajagopalan, who studied population geneticists in the US [29]. Most studies in these fields, especially when they are investigating human genetic history, are designed by large international author teams. Thus, the unpopularity of the term race can here probably be attributed to disciplinary culture and not to German scientific culture [54, 55].

In contrast, one third of the 17 clinical studies by pharmaceutical companies within our database used the term race (only 13.9% on average in the whole database). This is likely driven by US standards, as the US Food and Drug Administration demands reporting of race/ethnicity for drug approvals. How this classification is then applied to a German study population and to what effect remains to be investigated.

5.4 Limitations

Important classificatory terms such as “population” and “origin” are certainly underrepresented in our review sample, as these terms generated a plethora of non-specific search results that we did not want to include in our review or could not meaningfully evaluate with the resources available to us. Made by necessity, this decision might have resulted in our overlooking other common classificatory terms, biasing our data towards the most commonly applied terms. In our creation of umbrella terms to summarize the many different classifications used by researchers to describe and categorize the people they studied, we might have inadvertently merged terms that are conceived of very differently in practice. For example, it is possible that “migration background” is often operationalized in terms of the official German census definition of the term, while the classification “immigrant” is not. In addition, the decisions we took in determining what constitutes the “German life sciences” may have had an effect on the inclusion or exclusion of some research papers. Publication language is not a useful indicator as, apart from epidemiological studies, relevant scientific results from German research institutions are these days published almost exclusively in English-language journals [56]. Additionally, research teams at German institutions are frequently very international, e.g., in 2017 nearly half of the scientists employed at the Max Planck Society were not German nationals [57]. In addition, German scientists are required to work abroad during the postdoctoral phase of their careers, which may have a lasting impact on their research activity. Thus, affiliation to a German institution might be an imprecise surrogate marker in studying historical and cultural differences along national and linguistic lines. Similarly, it was sometimes difficult to determine the discipline of a given study without expertise in some of the specific research topics. Thus, as this article is published by an interdisciplinary team of authors, most of whom are affiliated to a sociological institute, we are aware that disciplinary subtleties might have been overlooked by our approach. However, we are confident that the search strategy of our literature review captured a broad overview of life science studies affiliated to German research institutes in terms of our research question.

6 Conclusion

Our quantitative content analysis of research literature illustrates that current German life sciences are prone to the same heterogenous use of human classifications found in the international studies cited. Additionally, we identified a unique relationship with the term “race”, which was used less the more German-based an author team was. The often interchangeable use of terms such as race, ethnicity, ancestry, migration, etc. that we registered within individual studies is an indication of a confusion and underdetermination of concepts that can lead to overgeneralization and stigmatization. Regardless of whether human classifications are included in research to measure and combat health inequalities, trace human history, or identify population-specific therapeutic targets, terms like race, ethnicity, and migration background cannot be separated from their social meanings. As Unger et al. write with reference to public health categories used with reference to im/migrants and ethnic minority groups, they “serve multiple functions and thus unfold ambivalent power effects” and can possess a wide range of connotations, ranging from supposed vectors of disease to being a particularly vulnerable population [20]. Categories are needed to make health inequalities based on exclusion and discrimination visible, but they can also create the very basis for exclusion and discrimination. In other parts of our ongoing research, we are analyzing the underlying meanings of the terms used more deeply by means of qualitative document analysis and interviews with investigators based in Germany [58]. One question we are pursuing is if and how an “absent presence” of race—i.e. racialization by avoiding the term “race”, as has been found for other European countries [37]—also manifests in Germany. Finally, we underline that imprecise and vague classifications also impede the national and international transferability of scientific results, thereby diminishing even the intended impact of the research in question.

We are very well aware that the negotiation and decision-making processes inherent in every step of our analysis from outline to coding practices to the formulation of results gives this review a very specific angle. And of course, some of the decisions made (e.g. how to demarcate which publications count as German research) are as imperfect and as much subject to negotiation as the classifications analyzed themselves. Still, we are convinced that this overview and the aspects we highlighted serve as a valuable foundation for a trans- and international debate on the potentials and pitfalls of classification practices in the current life sciences.