Main

hiPSC-based models offer the promise of facilitating discovery and functional validation of genetic variants regulating gene expression, particularly those underlying the processes of neurodevelopment. In this issue1, Jeremy Schwartzentruber and colleagues present the first map of regulatory variants in hiPSC-derived neurons, based on differentiating a large cohort of hiPSCs to a sensory neuronal fate.

Genetic studies increasingly demonstrate that a combination of rare and common variants underlies many complex diseases. Identifying and functionally validating these small and frequently context-dependent (cell-type-specific and/or treatment-dependent) effects is necessary to help untangle how common risk factors interact within the diverse cell types of the human body. The Genotype-Tissue Expression (GTEx) project aims to explore the functional role of common variants by characterizing variation in gene expression levels across individuals and tissues2, while the CommonMind Consortium (CMC)3 is applying similar methodologies focused on understanding how genetic variation in the human brain contributes to psychiatric disease (http://commonmind.org/). These studies should be supplemented by hiPSC-derived strategies that can generate additional cell types and/or developmental time points relevant to disease predisposition but not readily available in post-mortem datasets. Additional advantages of hiPSC-based strategies are that they avoid issues associated with post-mortem RNA decay and lifetime donor environmental exposure. Of course, hiPSC-based models are only as valuable as their ability to retain donor gene expression signatures.

Donor effect and variance

As the costs associated with reprogramming, validating and differentiating hiPSCs decline4, independent efforts have established diverse collections from healthy and diseased donors, making it possible to estimate the overall sources of variance (both inter-individual and intra-individual) in hiPSC-based studies. While recent work from the Next Generation Genetic Association Studies (NextGen) Consortium has begun to characterize the genetic5,6 and epigenetic7 basis of variation between hiPSC lines and differentiated progeny such as hiPSC-derived adipocytes8, hepatocyte-like cells8,9 and cardiomyocytes10, it remains unclear to what extent these studies inform understanding of common variants associated with neuropsychiatric disease. Inter-individual (i.e., genetic) differences explain much of the transcriptional variability between hiPSCs5,11,12, with donor effect reported to explain a median of ~6%13 and 48.8%5 of expression variation between hiPSCs. Yet, the retention of donor-specific expression signatures appears to be weaker in differentiated progeny, where we have observed donor effects of 2.2% in neurons14. Consistent with this, Schwartzentruber et al.1 report that more variation was explained by neuron differentiation batch (24.7%) than by donor and reprogramming effects in aggregate (23.3%). Of course, the extent to which other hiPSC-derived cell types retain the donor signal remains an open question.

Furthermore, Schwartzentruber et al.1 detail evidence of high differentiation-induced variability in hiPSC neuronal cultures. While reprogramming seems to maintain donor effect11, directed neuronal differentiation remains a somewhat variable process that can be perturbed by the initial hiPSC culture conditions as well as differentiation batch effects. This leads to variation within and between differentiations, even for cells from the same individual. To characterize this resulting cellular heterogeneity, they sequenced RNA from individual hiPSC-derived sensory neurons from one donor, demonstrating that, while 63% of cells formed a tight cluster expressing sensory neuronal genes, the remaining 37% of cells expressed genes more typical of fibroblasts. Using computational deconvolution, they found that the level of this fibroblast-like signature varied across hiPSC-derived neuronal cell lines in their data, even between multiple differentiations from the same hiPSC line. This is consistent with our own findings that variation in cell type composition between neuronal differentiations is driven by a fibroblast-like signature14. Moreover, they show that genes that are significantly upregulated following neuronal differentiation from hiPSCs, including ones critical to neuronal function, were the most variable. This high expression variability in hiPSC-derived neurons, which are not fully identical to their in vivo counterparts, is an important caveat that limits the power of these models.

hiPSC-based eQTL analyses

Despite the limitations to hiPSC-based studies more clearly delineated here, Schwartzentruber et al.1 successfully applied allele-specific methods to map 1,403 expression quantitative trait loci (eQTLs) and 6,318 chromatin accessibility QTLs (caQTLs) at a false discovery rate (FDR) of 10%. Here, as in the NextGen eQTL studies5,6, hiPSC-based eQTL analyses confirmed in vivo findings reported in GTEx, but also discovered novel eQTLs missed by tissue-level analyses. These positive eQTL findings reflect the larger effect sizes relative to other variables of interest, such as disease status or electrophysiological properties.

On the basis of the degree of expression variation observed in this dataset, Schwartzentruber et al.1 estimate that recall-by-genotype studies using hiPSC-derived neurons will require at least 20–80 unrelated individuals to detect the effects of regulatory variants with even moderately large effect sizes. This is a helpful insight, emphasizing the necessity of further increasing the overall size of hiPSC-based studies of complex genetic disease, even at the cost of eliminating replicate hiPSC clones for any given individual14,15. Moreover, it supports an urgent need to develop isogenic models to query the functional impact of common variants, many of which are not conserved in rodents and must be studied in human cells.

Overall, these findings raise concerns about cell type heterogeneity and the associated expression variation across multiple differentiations that currently limit the power of the hiPSC platform. This raises an important challenge for the field when selecting differentiation or induction protocols for hiPSC-based molecular and cellular analyses (Fig. 1). Current differentiation protocols tend to be evaluated on the basis of cellular yield (i.e., the percentage of cells positive for one or more cell-type-specific markers) across a handful of hiPSC lines. Moving forward, it will be critical to assess expression variance between differentiations using cells from the same and different individuals, to test the extent that the donor effect is conserved. The power of hiPSC-based models for molecular and phenotypic studies of disease risk depends on this retention of the donor-specific component of gene expression.

Fig. 1: hiPSC-based studies of disease risk depend on the retention of a donor-specific component of gene expression.
figure 1

The schematic illustrates predicted retention of the donor signature through reprogramming and subsequent neuronal differentiation or transcription factor–mediated neuronal induction. Unlike current methods focused on yield variability between hiPSCs, future approaches should also identify the methodology that best retains the donor signature in hiPSC-derived neurons (inset).