Introduction

Critical differences exist in the physiology, geography, and lifestyles of patients of various genealogies and ancestries. When one examines how medical procedures, diseases, and cancers will affect a patient, the ancestral and genetic background of these patients must be considered to optimize medical care [1,2,3]. The ancestral background of tissue donors has also been recognized as a potential source of variability in the development of regenerative engineering products[4]. In order for a wider distribution of patients in the clinic to benefit from biomedical research, we propose that tissues and cells used on the bench warrant similar ancestral characterization. Genotyping of some cell lines has already been done to improve cancer research [5]. Unfortunately, such consideration is not currently standard practice [5].

The use of the concepts of race, ethnicity, and ancestry in biomedical research is a subject of intense debate. The concept of race originated primarily as a means to justify the existence of social inequalities, segregation, and slavery [6]. The findings of genomic studies do not correlate neatly with the social concept of race, although some traceable differences between different populations do exist [7, 8]. Consideration of a person’s race and/or ethnicity may shed light on the influence of social factors and experiences of racism on a patient’s health [8]; however, the consideration of race does not always align with the person’s genealogical ancestry. Ancestry focuses on an individual’s similarity to others of a common geographic background instead of on variables such as hair type and skin color [7, 9]. Ancestry is therefore a more inclusive tool than race for studies on population-level health tendencies.

There are several strategies currently in use for reporting race or ancestry. In this review, we will use the five categories of self-reported race used in the current U.S. Census: White, Native Hawaiian or Pacific Islander, Asian, American Indian, and Black or African American [10]. We do not claim that these categories are all-encompassing. Indeed, a different scheme commonly used in genotyping studies, and therefore probably better suited to discussions on ancestry, has included seven categories: African, Native American, North East Asian, South East Asian, South Asian, North European, and South European [5]. In this review, we have chosen to use the U.S. Census racial categories because they are broad enough to allow the inclusion of as many literature sources as possible, regardless of how specific or unspecific each article was in reporting race or ancestry. The results serve to illustrate our main claim of underreporting of ancestry among in vitro studies.

Race/Ancestry Is Underreported in In Vitro Biomaterial-Based Studies

We conducted a literature search to estimate how frequently researchers reported the ancestral background of donors when conducting in vitro experiments with human primary cells or immortalized cell lines. We surveyed articles published during a six-month period between July 1 and December 31, 2019. This timeframe provides a glimpse of current practices without the risk of confounding effects from the disruptions in research practice caused by the COVID-19 pandemic. The ten biomaterial journals we targeted were ACS Biomaterials Science and Engineering, Advanced Healthcare Materials, Frontiers of Bioengineering Biotechnology, Journal of Biomedical Materials, Journal of Translational Medicine, Lab on a Chip, Nature Biotechnology, Nature Biomedical Engineering, Science Translational Medicine, and Scientific Reports. We reviewed the main text of 202 communications and articles describing in vitro cell culture experiments (Table 1). Among these 202 articles, there were 341 instances of the use of human cells listed.

Table 1 Journals investigated. Articles published between 1 July 2019 and 31 December 2019 were included in our analysis

For every journal article found, we recorded the body tissue the cells were derived from, whether the cells were from an immortalized cell line, the ancestry identified either specifically stated in the paper or inferred from outside sources such as public databases or cell suppliers, and the sex of the cells. Our research involving the race/ancestry of the cells found a lack of representation in literature which holds to the finding of previous work [11, 12].

From the 341 instances of human cell usage, 63% of cells were from commercially available immortalized cell lines and 35% of the cells were from primary samples. An additional 2% of cells were from unspecified immortalized cell lines. These immortalized cells were considered primary cells for our later analyses, since, unlike commercially available cells, the demographic information of immortalized cells produced in-house is not searchable through public databases.

We further probed the journals for ancestry information. Many sources reported only race or only ancestry, and so, in order to analyze all possible information together, and since race is the less specific metric, we re-categorized the cells for which ancestral information was given based on the donor’s probable race. The ancestry information for these cells was reported as the percentage of the genome that correlated with each ancestral group (African, Native American, North East Asian, South East Asian, South Asian, North European, and South European [5]). To re-categorize into race information according to the U.S. Census, we took the ancestral group with highest percentage correlation for each cell and considered that to be the primary driver of race for that cell. “North East Asian” and “South East Asian” were considered “Asian,” “African” was left as “African,” and “North European” and “South European” were considered “White.” No other ancestral group was highly represented in the genomes of the cell lines we investigated. A listing of all cell lines investigated, along with the highest percentage reported ancestral information and the re-categorized race information, where applicable, is given in Table S1. We understand this re-categorization is an inherent limitation in this work; however, we utilize the U.S. Census reporting to illustrate the overall dearth of race/ancestral reporting in the literature.

We identified 91 distinct human cell lines used in 213 instances across the studies that we investigated. Genomic ancestry information was only reported in Table S1 if a breakdown of percentages for ancestral background was reported. If the patient’s geographical background was given, it was considered race information (e.g., if a patient was reported to be Chinese, we listed their race as Asian but did not list their ancestral information because no genomic data was available). Seven cell lines were either produced in-house or listed without their full descriptive titles and were therefore not searchable through public databases. For commercially available immortalized cell lines, sex and ancestry information was found by searching for the cell line names in the Expasy Cellosaurus database. For cell lines that did not have ancestry data on Cellosaurus, an additional search was made on the website of the cell line vendor ATCC, which lists donor information for many of its products. Searches were also made on Coriell.org, and a description of the origin of one cell line (NP460) was found in the literature [13]. One cell line (NB1RGB) was found on cellbank.brc.riken.jp. Of the cell lines for which sex and ancestry information was searchable, 28 were from male donors, 32 from female donors, and 4 were known to be contaminated with other cell lines. Of the non-contaminated lines for which ancestry information was available, 43 donors were White, 12 were Asian, and 5 were of African descent. Two of these 60 lines were listed in Cellosaurus as “problematic” due to misidentification, but since genomic ancestry data was available, they were included in our analyses.

Under 6% of primary cells used in the articles had information on race or ancestry, while about 78% of the cell lines investigated had a known race (Fig. 1). Among both primary cells and cell lines, the vast majority of cells with reported race were from White donors.

Fig. 1
figure 1

Racial background reporting in primary or immortalized cells. Data from cells for which ancestral information was obtained directly is pooled with re-categorized data from sources that gave ancestral information. If the same commercially available cell line was used multiple times among the articles surveyed, counts are reported for each instance of use

Among tissue sources of cells used for in vitro studies, blood was by far the highest represented primary cell source, although a very small percentage of studies on blood reported donor race (Fig. 2a). Notably, there was wide variability in the representation of different racial or ancestral backgrounds among different tissue types, especially in cancer cell lines. This may be due in part to differences in disease risk among different populations. For example, Asian populations are at a higher risk for nasopharyngeal carcinoma, and there is a higher percentage of cells from Asian donors in our “Skin/Dermis” category because of the two nasopharyngeal carcinoma lines NP460 and NPC43, both of which are from Chinese males. While increased prevalence in some populations might explain why some tissue types have more representation from certain ancestral groups than others, another explanation for such trends is simply that some cell lines are used more frequently than others. For example, there are more donors of African descent in the “Reproductive” category than in any other category. This is because this category includes cervical cancer cells and is therefore affected by the widespread use of HeLa cells.

Fig. 2
figure 2

Detailed analysis on racial background of primary and immortalized cells by a tissue type and b reporting journal. If the same commercially available cell line was used multiple times among the articles surveyed, counts are reported for each instance of use

Among the journals that we investigated in the specified six-month window, only the Journal of Biomedical Materials, the Journal of Translational Medicine, and Scientific Reports had articles with information on the race of primary cell donors, with only 6 Whites and 1 African American among the three (Fig. 2b). Other journals such as Advanced Healthcare Materials and ACS Biomaterials Science and Engineering showed no reporting of race, although the races or ancestries of several of the cell lines used in these journals can be found through a search of public databases.

Though our review is a snapshot of six months and only includes ten journals, we believe these results are likely to be consistent across the field. Our study points out the lack of reporting on race and ancestry in biomedical in vitro work. It is essential for this lack of reporting to be addressed. To better represent the U.S. population, researchers must consider race and ancestry when experimenting with human cells in vitro. Ancestry ID specifically can be obtained by genotyping and should be reported when human cells are used in biomedical and biomaterial research.

The Consideration of Cell Vendors

It is possible to obtain human cells from a multitude of cell vendors from around the world, and researchers may select either primary cells or cell lines for their research. These vendors, such as ATCC, Lifeline Cell Technologies, Cell Applications Inc., European Collection of Cell Cultures, and ThermoFisher Scientific, include certificates of analysis for their cell products. Although the majority of these vendors do not have the ancestries of the donors reported on their websites, product descriptions, or specification sheets, researchers can contact the vendors to place a customized order for a specific ancestry. Additionally, some vendors, including ATCC, Cell Applications, and Lifeline Cell Technologies, include ancestries in their certificate of analysis. ATCC is also a notable exception in that they largely do report the ancestry of the donors on their website. Lifeline Cell Technologies may provide all donor specifications, including ancestry, if contacted. Cell Applications does not require reporting on the ancestry of donors when acquiring and specifying cells; however, they will locate the ancestry of the donor for the product in question upon request. ThermoFisher also reports ancestries as batch-specific and on their products’ certificates of analysis. Interestingly, we observed that ancestry specifications are more frequent for cell lines than for primary cells.

The fact that it is not common practice for cell vendors to report the ancestries of all cells is inhibitory for researchers acquiring that information. We suggest that companies implement protocols that would require them to obtain and readily release critical de-identified donor information to customers. This includes ensuring every cell product has the ancestry of the donor documented and easy to find. Requiring cell vendors to include this information would allow researchers, especially those focusing on precision medicine applications, to make informed decisions when acquiring cells for their studies. The inclusion of ancestry would also allow greater transparency towards which cells are assessed within biomedical research.

Consider Ancestry: Opportunities to Reverse this Trend

The ancestry of cells to be used for in vitro experiments should be carefully considered during the design stage of the work. The genetic and epigenetic differences between cells from donors of different ancestries should be recognized as instrumental players in the cells’ responses to stimuli, and so should no longer be casually ignored. The incorporation of information on the ancestral background of the cells used in the lab will lead to better clinical translation and help pave the way for more personalized medicine.