Keywords

1 Introduction

Tissue-specific embryonic progenitors are important gateway cell populations at various stages of organ development [49]. Within the endoderm germ layer, a substantial body of work has deciphered the emergence of domain-specific primordial progenitors, including signaling requirements for tissue specification, and has uncovered key transcription factors (TFs), such as Nkx2-1 for the lung and thyroid domains, Pdx1 for the pancreatic domain, and Cdx2 for the intestinal domain, for tissue specification or development or both [30, 52, 55]. Most importantly, self-renewing, transplantable, multipotent lung progenitors are of value in lung regenerative medicine; mechanistic understanding of their cell fate decisions will be key for their successful clinical application.

In this chapter, we review the biology of multipotent embryonic progenitors of the respiratory system and recent efforts to employ such progenitors for the modeling and study of cell fate decisions in ex vivo organotypic models of human lung development (Fig. 4.1). We also assess efforts to derive de novo lung progenitors of different developmental stages from pluripotent stem cells (PSCs). As the areas of mammalian developmental and PSC biology continuously inform each other [96], there is an increasing need for computational models to evaluate and quantify the similarity between embryonic and PSC-derived progenitors. We critically appraise the nascent literature on cell similarity models, including their limitations and underlying assumptions.

Fig. 4.1
An illustrated chart of types of progenitor cells in lung primordium, pseudo-glandular stage, and canalicular stage in ex-vivo and in-vitro.

Major types of progenitor cells in the developing lung and their in vitro engineered counterparts. SOX2 is expressed only in human distal lung tips

2 Overview of Embryonic Lung Progenitors

2.1 Stage-Specific Epithelial Progenitors (Primordial, Distal Tip, Basal)

The respiratory system is specified within the anterior ventral foregut endoderm (the anterior part of the gut tube) at around embryonic day (E)9.0 (18–22 somites) in mice and 4 weeks of gestation in humans [48, 102]. During subsequent stages (embryonic, pseudoglandular, canalicular, saccular, and alveolar), developmental processes, such as tracheoesophageal septation, formation and elongation of the primary lung buds, branching morphogenesis, and alveologenesis, lead to the anatomically and functionally distinct compartments of the adult lung that perform the basic functions of air conduction, mucociliary clearance, and gas exchange. These processes are concurrent with and inseparable from diversification of cell fates driven by stage-specific embryonic progenitors. Genetic alteration or ablation of such progenitors can lead to profound developmental defects, such as lung agenesis and mispatterning of the lung proximal-distal axis.

2.1.1 Lung Primordial Progenitors

The lung epithelial primordial progenitors are the first epithelial cells within the developing gut tube to adopt a lung fate at around E9.0 in the mouse. The appearance of this rare, transient progenitor population is marked by expression of the homeodomain TF Nkx2-1 [48, 54]. Nkx2-1+ lung primordial progenitors are multipotent progenitors and give rise to the vast majority of lung epithelial lineages (both airway and alveolar), as shown by a lineage tracing study performed in our laboratory [48]. Although Nkx2-1 is not necessary for lung specification [55], it is the earliest marker of lung epithelial fate, and its expression has been used extensively to benchmark PSC-derived lung multipotent progenitors, as discussed below. Remarkably, ventx2, a Xenopus homeodomain TF, was recently shown to be expressed in the prospective respiratory domain at Nieuwkoop and Faber (NF)31/32 stage, earlier than Nkx2-1 (NF33/34), raising the tantalizing possibility that it is the earliest marker of respiratory (tracheal and lung) fate in Xenopus and potentially in humans [90].

Given the inaccessibility of the lung primordial population in humans, most information on the molecular pathways regulating its emergence and potential gene regulatory networks derives from mouse and Xenopus studies, both in vivo and ex vivo [15, 48, 89, 102]. Xenopus (Xenopus laevis, African clawed frog) has provided an excellent, complementary to mouse, model of foregut development, due to highly conserved endoderm formation, rapid development, and facile acquisition of large numbers of embryos, offering the possibility of medium-to-high-throughput loss-of-function studies [62, 89]. Mesoderm-derived Wnt signaling is indispensable for lung specification [32, 38], and Bmp signaling is necessary for correct spatial restriction of respiratory fated progenitors and tracheoesophageal separation [26]. Retinoic acid (RA) signaling confers lung competence to endodermal cells, which can then respond to Wnt and Bmp signals [87, 88]. The minimal signaling requirements for lung endodermal fate have also been demonstrated by pathway analysis between the transcriptomes of flow sorted lung primordial progenitors and prespecified endoderm, which demonstrated upregulation of Wnt and Tgf-β superfamily pathways in the former population [48].

In stark contrast to other Nkx2-1+ embryonic tissues, such as thyroid and forebrain [24, 97], little else is known regarding the gene regulatory program underlying the singular lung primordial identity. Apart from Nkx2-1, pan-endodermal TFs, such as Foxa2 [99], and TFs with known roles in later lung development, such as Irx2 and Foxp2 [36, 48], are also expressed in the lung primordium. Future work will certainly decipher TF interactions that uniquely define the lung primordial identity and are involved in early epithelial cell fate decisions in lung development.

2.1.2 Distal Tip Progenitors

Following respiratory system specification, its anterior-posterior axis is rapidly formed as witnessed by both morphological changes, such as trachea formation and elongation and emergence of primary lung buds, and mutually exclusive expression of the TFs Sox2 and Sox9 in the murine tracheal and lung domains [4]. During the ensuing pseudoglandular and canalicular stages, highly proliferating progenitors at the distal tips of the developing lung are essential for branching morphogenesis and formation of airway and alveolar epithelia [4, 81, 91]. Extensive work in mouse genetic models and, most recently, in human fetal organoids has shed light on the molecular profile of distal tip progenitors, their competence, and gene regulatory networks.

In mice, distal tip progenitors possess a distinct transcriptional profile with high expression of a constellation of TFs, such as Sox9, Id2, Nmyc, and Etv5 [4, 44, 61, 67, 77, 80, 91]; matrisomal genes such as Thbs1, Lama3, and Fbn2 [67, 81, 95]; and cell surface markers such as Clu [12, 81]. Interestingly, human distal tip progenitors also express SOX2, which is considered a proximal airway marker in murine development, during the pseudoglandular stage, but its distal expression is extinguished at the canalicular stage [22, 72, 78]. Lineage tracing studies have demonstrated that tip progenitors have a broad differentiation repertoire during the pseudoglandular stage (as early as E11.5), giving rise to major lung epithelial lineages (club, ciliated, neuroendocrine, and alveolar type I (AT1) and type II (AT2) cells) [91], while at later time points (E16.5) their differentiation potential is restricted to alveolar epithelial types. Nevertheless, late tip progenitors appear to possess some degree of plasticity since they can give rise to airway progeny when grafted in cultured E12.5 lung explants [61]. Further insights in the developmental potential of distal tip progenitors have been derived by combining RNA-sequencing (RNA-seq) with assay for transposase-accessible chromatin using sequencing (ATAC-seq) at two developmental time points, E11.5 and E16.5 [53]. Remarkably, almost 25% of open chromatin regions were specific to each time point, probably reflecting the time-dependent competence of tip progenitors. This integrated analysis identified increased expression and chromatin accessibility of PI3K signaling pathway components at the later point, and inhibition of this pathway at an early time point increased the percentage of SOX9+/NKX2-1+ lung epithelial progenitors.

Integration of several pathways, such as β-catenin, Fgf, and Bmp signaling, ensure the proliferation and maintenance of the distal tip progenitor identity, as has been demonstrated in numerous genetic mouse studies. β-catenin-dependent Wnt signaling, which is indispensable for lung specification, is also required to maintain distal tip progenitors, by controlling Sox9 and Nkx2-1 expression and acting downstream of Fgf/Kras signaling [81]. Precise modulation of β-catenin levels is critical for respiratory identity of tip progenitors, as either deletion of hyperactivation of β-catenin leads to the ectopic expression of gastrointestinal or liver genes within the developing respiratory epithelium [79, 81]. Glucocorticoid signaling, which is necessary for differentiation of AT2 cells [31, 105], also dictates, at late stages of respiratory development, the formation of proximal-distal boundary, namely, the bronchoalveolar duct junction (BADJ). Premature activation or deletion of glucocorticoid signaling leads to proximal or distal shift of the BADJ, respectively [4]. Glucocorticoid signaling and STAT3 signaling, activated by LIF and other IL6 family ligands, can act in concert to promote alveolar differentiation of distal tip progenitors from the canalicular stage [61].

Murine distal tip progenitors have a critical role in lung epithelial differentiation and morphogenesis. Information gleaned from developmental studies have propelled the development of lung-directed differentiation protocols from human and mouse PSCs, and ex vivo organoid culture of such human fetal progenitors has provided valuable insights in similarities and differences in respiratory development between the two species, as detailed below.

2.1.3 Airway Basal Cells

An important stem cell population in the adult respiratory system are P63+/KRT5+ airway basal cells that can self-renew and differentiate to a wide variety of proximal cell types, such as secretory, ciliated, tuft, ionocyte, and neuroendocrine [6, 93]. There is a clear species difference in basal cell distribution as basal cell-containing pseudostratified epithelium is limited in the trachea and mainstem bronchi in mice but extends more distally spanning several airway generations in humans [94]. Recent work has shed light to the developmental origins of this population [73, 112]. Intriguingly, dispersed P63+ cells are already present in the Nkx2-1GFP+ lung primordium (around E9.0–9.5) and can give rise to cells in both the airway and alveolar compartments [112]. Epithelial P63+ cells are rapidly restricted to the tracheal domain by E10.5 with concomitant lineage restriction to tracheal and proximal airways. Lineage tracing studies at later time points indicate that the pool of embryonic progenitors to E18.5 p63+Krt8- prebasal cells is established as early as E13.5–E14.5, as cells labeled at this time point comprise more than 80% of perinatal prebasal cells [112]. ITGB4hi multipotent progenitors have been described in early developing trachea (E14.5), but it is not clear to which extent these progenitors overlap with the P63+ progenitor pool [11]. In human fetal lungs, P63+/KRT5+ cells, considered bona fide basal cells, are present in trachea and mainstem bronchi by 12 weeks [73]. Interestingly, P63+/KRT5- in non-cartilaginous and small-cartilaginous airways have nuclear pSMAD staining, and they may be the human equivalent of basal cell progenitors described in the study of Yang et al. [112].

2.2 Stage-Specific Mesenchymal Progenitors

Mesenchymal-epithelial interactions are indispensable at every stage of lung development, from lung epithelial specification to branching morphogenesis. For example, different degrees of disruption of endodermally derived Shh signaling to splanchnic mesoderm lead to severe developmental effects, such as lung hypoplasia and tracheoesophageal fistula or even lung, tracheal, and esophageal agenesis [66, 74]. On the other hand, FGF10 expressed in the lung epithelium-adjacent mesoderm is important for bud induction [15], whereas, at later stages, mesenchymal FGF10 expression in the vicinity of distal tips is part of a complex signaling circuit that regulates tip outgrowth and proliferation [10, 63].

As the lung develops, the mesenchyme is also subject to a diversification of fates, leading to the various mesoderm-derived cell populations in the adult lung, such as vascular and airway smooth muscle cells, endothelial cells, pericytes, and pleura. Lineage tracing studies alone or in combination with new high-resolution genomic methods, such as single-cell RNA-seq (scRNA-seq), have allowed for the characterization of various mesenchymal progenitor populations and their developmental potential, as they have been comprehensively reviewed in Riccetti et al. [92]. For example, lineage tracing using different gene drivers has shown that several lung mesenchymal lineages, such as vascular and airway smooth muscle cells, proximal endothelial cells, and Pdgfrβ+ pericyte-like cells, originate from Wnt2+/Gli1+/Isl1+ cardiopulmonary progenitors [82]. Systems that employ Tbx4 lung-specific enhancers have also been described and used for lineage tracing (as Cre or rtTA drivers) of mesenchymal progenitor progeny at various windows during lung development [59, 115].

Zorn et al. has recently used scRNA-seq to describe signaling networks governing interactions between mesenchymal and epithelial domains in early foregut development (E8.5–E9.5) [36]. This study revealed several mesenchymal cell clusters corresponding to domain-specific mesenchymal populations that engage in signaling crosstalk with the corresponding epithelial populations. In the developing lung mesenchyme, there was expression of previously described lung mesenchymal markers, such as Wnt2 [32], Osr1 [86], Foxf1 [19, 84], and Tbx5 [13]. Nkx6-1, a known pancreatic epithelial TF, was also expressed in the respiratory mesenchymal domain (both by transcript and protein), and it was one of the key TFs on the cell-state trajectory from foregut lateral mesoderm to lung mesenchyme [36].

Similar transcriptomic characterization of the developing human lung mesenchyme was performed using fetal human tissue from various endodermal organs, including the lung, from 7 to 21 weeks postconception [113]. As predicted, human lung mesenchyme was quite different than intestinal and stomach mesenchyme, with WNT2 being one of the most differentially expressed genes. The TF PRRX1 marked chondrocyte-like cells and pericytes in the lung. Furthermore, distal tip mesenchymal progenitors were also identified using human lung fetal tissue from pseudoglandular to canalicular stages [43]. These progenitors were RSPO2+ and could be purified based on LIFR expression. The RSPO2+ progenitors were found to maintain the distal tip undifferentiated state by epithelial LGR5-mediated Wnt signaling [43].

3 Ex Vivo Culture of Multipotent Embryonic Lung Progenitors

As discussed in the previous section, use of mouse models has been instrumental in our understanding of lung embryonic progenitors, including their differentiation repertoire and signaling pathways that control their specification and self-renewal. Embryonic explant culture has provided another powerful tool to study early lung development, due to ease of manipulation of the soluble environment (including activators and inhibitors of developmental pathways) and molecular characterization of the explants [15, 16, 25, 61]. Recently, lung developmental studies have received considerable impetus by the ex vivo expansion and differentiation of fetal respiratory progenitors that are phenotypically and functionally similar to their in vivo counterparts. Ex vivo progenitor organoids combined with state-of-the-art technologies, such as multimodal single-cell profiling and clustered regularly interspaced short palindromic repeats (CRISPR)-mediated gene editing [50], are already enabling high-resolution studies of lung embryonic progenitor biology.

3.1 Ex Vivo Culture of Mouse Embryonic Progenitors

As embryonic progenitors are transient populations that exist within defined developmental windows, their capture and ex vivo expansion and multilineage differentiation offer a powerful, tractable system to study developmental questions and develop organoid platforms. While ex vivo culture of endodermal progenitors, such as pancreatic and hepatic progenitors, has been reported for some time now [35, 109], only recently has culture of respiratory progenitors been established. Nichane and coworkers were able to purify mouse Sox9+ distal tip progenitors using a Sox9 fluorescent reporter strain and develop defined culture conditions (lung progenitor medium, LPM) for their ex vivo expansion [77]. 3D culture in Matrigel, activation of Wnt signaling by GSK-3β inhibition (CHIR99021), inhibition of Tgf-β and p38 signaling, and addition of Fgf (FGF9, FGF10) ligands and EGF were necessary to maintain E12.5 tip progenitors in an undifferentiated, multipotent state, with high expression of known tip markers, such as Id2, Nmyc, Etv5, and Sox9. Importantly, ex vivo expanded progenitors were able to differentiate to airway and alveolar lineages both in vitro and after transplantation to the mouse lungs following naphthalene or bleomycin injury [77]. Interestingly, the independent and contemporaneous work of Spence et al. identified FGF7, CHIR99021, and RA (3F medium) as the minimal signaling requirements for ex vivo maintenance of distal tip progenitor cells from E12.5 mouse lungs [72]. A similar system established earlier in lung development (E11.5) was employed for a genetic screen of Sox9+ tip progenitor regulators [2]. This screen identified Aurkb, among others, as a regulator of tip progenitor cell cycle, and its relevance was confirmed by in vivo deletion in distal tip progenitors, leading to profound defects in branching morphogenesis.

In our work, we were able to isolate rare E9.0 lung primordial epithelial progenitors using the previously published Nkx2-1GFP reporter mouse [48, 68]. scRNA-seq of in vivo and ex vivo expanded lung primordial progenitors guided modifications of the distal tip expansion medium, thus allowing us to successfully culture primordial progenitors that proliferated and maintained Nkx2-1GFP expression for several passages [64]. This ex vivo system can be used to study cell fate decisions during the embryonic stage of lung development.

3.2 Ex Vivo Culture of Human Embryonic Progenitors

The need for ex vivo systems is even more salient with human fetal progenitors due to the relative inaccessibility of human fetal tissue and severe restrictions on related research in certain regions [3]. The possibility of culturing ex vivo human distal tip progenitors was first shown by Rawlins et al. [78]. Use of a cocktail of growth factors (EGF, FGF7, FGF10), Tgf-β (SB431542) and BMP (Noggin) inhibitors, and Wnt activators (CHIR99021) and potentiators (R-spondin 1) resulted in long-term culture of mesenchyme-depleted tip progenitors from 5- to 9-week fetal lungs. There was absolute requirement for activation of FGF and Wnt and inhibition of Tgf-β signaling for the self-renewal of the progenitors. The multilineage differentiation potential of the ex vivo expanded progenitors was convincingly demonstrated in a series of both in vitro (PneumaCult for bronchiolar differentiation, alveolar differentiation medium with expanded human lung fetal fibroblasts) and in vivo (kidney capsule co-transplantation with E13.5 mouse fetal cells) assays. Miller et al. showed that the 3F medium (FGF7, CHIR99021, RA) described for the expansion of the mouse distal tip progenitors, as well as the 4-factor medium (FG7, FGF10, CHR99021, RA), were sufficient for the ex vivo stable culture of 12-week dissected human lung distal tip progenitors [72]. The expanded progenitors had low expression of airway markers, such as P63, FOXJ1, and SCGB1A1, and high co-expression of tip markers, such as SOX9 and SOX2. Notably, expanded progenitors had significant transcriptional similarity with freshly isolated fetal lung tips. Although BMP4 removal resulted in higher SOX9 expression in Miller et al. [72] similarly to BMP inhibition in Nikolic et al. [78], the absence of Tgf-β inhibition requirement in the former protocol is intriguing, and it may be explained by subtle differences in the two systems due to different developmental times of the tissue of origin, including endogenous TGF-β levels.

Establishing tractable systems of human fetal lung progenitors opens up the exciting possibility of ex vivo human development studies, as shown by recent work of Rawlins et al. and Spence et al. [18, 65, 100, 101]. Engineering a CRISPRi system in human distal tip progenitors, Sun and coworkers were able to perform loss-of-function studies and systematically screen the role of several TFs in self-renewal of SOX9+ progenitors [100]. For example, MYBL2 (proliferation-related gene) and CTNNB1 (Wnt signaling effector) guide RNAs were highly depleted in a 2-week culture of organoids, indicating a role of these genes in progenitor self-renewal. Knockdown of SOX9 resulted in downregulation of 455 common genes in two organoid lines, and further gain-of-function and genomic occupancy studies identified ETV4, ETV5, and NMYC and Wnt-related genes, such as LGR5 and CD44, as direct transcriptional SOX9 targets. Removal of Wnt activators from the tip medium resulted in SOX9 expression loss. Interestingly, genes suppressed by SOX9 included secretory cell-related genes, as well as non-lung (stomach, liver) endodermal genes, similarly to mouse respiratory development [81].

Human distal tip progenitors from a later (canalicular) stage were used to study alveolar differentiation due to their restricted differentiation potential [65]. CD36 was specifically expressed in these progenitors and along with CD44 marks distal tip progenitors from 16- to 21-week human fetal lungs. CD36/CD44 co-expression has also functional significance as CD36+CD44+ (LinPos) organoids had greater propensity to differentiate to alveolar cells. In this system, NKX2-1 and TFAP2C were found to regulate opposing fates, with the former TF promoting alveolar and the latter promoting airway (basal) fate, as shown by gain-of-function studies.

Distal tip organoids have been also used to interrogate lineage relationships, focusing on newly identified progenitors [18]. In scRNA-seq of the developing human lung, SCGB3A2/SFTPB/CFTR co-expression identified a progenitor cell, named fetal airway secretory (FAS) cell. FAS cells emerged during airway differentiation [72] of distal tip progenitors and were shown, following barcoding-mediated lineage tracing, to give rise to pulmonary neuroendocrine cells and a subset (C6+) of multiciliated cells.

Overall, culture of lung fetal progenitors has offered important insights in lung epithelial development and provided sophisticated systems that are being used to study the intricate relationships between progenitor competence, gene regulatory networks, and signaling pathways.

4 In Vitro Derivation of Multipotent Embryonic Lung Progenitors

A large body of work reviewed recently [29, 104] has led to the development of directed differentiation protocols for the derivation of a variety of lung lineages from both mouse and human PSCs. It is now well established that derivation of lung-competent anterior foregut endoderm via dual Smad inhibition followed by Wnt and Bmp signaling can generate lung multipotent primordial-like progenitors in vitro [27, 33, 34, 39, 47, 48, 68, 75]. These progenitors can be highly purified by the use of either knock-in fluorescent reporters, such as eGFP and mCherry, or cell surface marker sorting algorithms, such as CPMhi and CD47hi/CD26lo [33, 39, 48, 68, 98, 110]. Interestingly, Cd26 (Dpp4) is one of the most downregulated genes upon in vivo lung specification in mice [48]. Wnt signaling is also important post-specification in vitro with timing of Wnt addition or withdrawal leading to different differentiation trajectories (airway or alveolar) [23, 51, 58, 71].

As PSC-based systems have been further refined, one major goal is to derive and stably expand in culture other important lung epithelial and mesenchymal progenitors. Earlier efforts had led to the long-term culture of complex PSC-derived organoids that initially contained tip progenitors and appeared to recapitulate the transcriptional profile of second-trimester human lung [17]. At the same time, there is an imperative need to develop computational methods to reliably benchmark such populations against their presumed in vivo counterparts or ex vivo expanded embryonic progenitors, as discussed in the following section. Creation of human or mouse developmental atlases at single-cell resolution [41, 73, 76, 113] can satisfy this need while also driving the study of developmental trajectories and cell fate decisions.

Hawkins and coworkers reported the derivation and isolation of airway basal-like cells (iBCs) from human PSCs using a bi-fluorescent (NKX2-1GFP; TP63tdTomato) reporter cell line [40]. These cells were expandable in airway media (PneumaCult-Ex) containing dual Smad inhibitors and rapidly acquired and maintained expression of the airway basal cell marker, NGFR [93]. Furthermore, their airway multilineage potential (ciliated, secretory, goblet) was demonstrated in both in vitro air-liquid interface (ALI) culture and in tracheal xenografts.

The derivation of stable distal tip-like progenitors from PSCs has also been reported [42, 45]. Refinement of a previously published protocol for the derivation of mouse lung primordial progenitors [48] and subsequent use of the LPM medium [77] led to the stable expansion of a progenitor population that maintained Nkx2-1mCherry expression and also expressed distal tip markers (Sox9, Id2) [45]. Remarkably, these PSC-derived progenitors were able to engraft (up to 15 weeks) in the distal lung, following transplantation into bleomycin-injured, syngeneic, immunocompetent recipients. The transplanted progenitors differentiated into AT1- and AT2-like cells that were highly similar to their endogenous counterparts as evidenced by protein expression of mature markers, such as PDPN and proSFTPC, and high similarity scores using a novel scRNA-seq-based computational method (single-cell type order parameters, scTOP) [45, 111]. Similarly, human PSC-derived lung progenitors adopted a bud tip fate in a previously described medium [72] and were expanded over several weeks as bud tip organoids (iBTOs) [42]. These progenitors express several markers of human pseudoglandular stage distal tip cells (SOX9, ETV5, ID2, HMGA2), had high similarity computational scores to culture-expanded primary human distal tip lung progenitors, and were able to differentiate to either alveolar or airway cells in respective media.

The value of PSC-based systems as a discovery/validation tool for hitherto unknown progenitor populations was evident in two publications [8, 70]. McCauley et al. used a human PSC reporter line to derive and isolate SCGB3A2+ early secretory progenitors. By using scRNA-seq, a Wnt-dependent subpopulation of these progenitors was found to co-express an AT2-like program, including genes such as SFTPC, NAPSA, and PGC, and even possess functional lamellar bodies. This in vitro engineered population was further validated in Basil et al. [8] and appears to have high similarity with the newly discovered SCGB3A2+ respiratory airway secretory (RAS) cells in human terminal bronchioles, which are progenitors to AT2 cells and whose alveolar differentiation is regulated by Notch and Wnt signals.

While lung epithelial progenitors are routinely derived from PSCs, derivation of multipotent lung mesenchymal progenitors has lagged behind due to the early developmental broad expression of several mesenchymal TFs and dearth of information on the signals specifying lung mesenchyme. Recently, several papers have reported efforts to derive functional multipotent lung mesenchyme from either human or mouse PSCs [5, 36, 56]. In a proof-of-principle study, Han and coworkers gleaned information from their study of coordinated epithelial-mesenchymal development within the early foregut endoderm to derive distinct subtypes of tissue-specific mesenchyme, including lung mesenchyme, from human PSCs [36]. Following derivation of lateral plate mesoderm, they used BMP4, FGF2, and RA with simultaneous inhibition of Wnt and Tgf-β signaling and activation of Shh signaling by purmorphamine (PMA), a Shh signaling agonist, to induce an anterior foregut splanchnic mesoderm fate. Further activation of BMP, Shh, and RA signaling with activation of Wnt at later stages led to the derivation of lunglike mesenchyme, based on co-expression of NKX6-1, TBX5, FOXF1, and WNT2 [57].

Similar work of Morimoto et al. identified β-catenin-dependent Wnt signaling as an important signal for early lung mesodermal Tbx4 expression in the tracheal domain [56]. Lateral plate mesoderm (Foxf1+) was treated with BMP4 and CHIR99021 to produce tracheal mesoderm from mouse PSCs, and the downstream differentiation to chondrocytes and smooth muscle cells of the latter was demonstrated with immunostaining and qPCR. Induction of tracheal mesoderm with chondrocyte and smooth muscle competence from human PSCs required further addition of PMA.

Alber and coworkers made heavy use of a Tbx4 enhancer Cre mouse PSC line (Tbx4-rtTA; TetO-Cre; mTmG; Tbx4-LER) to indelibly mark and purify PSC-derived lung-specific mesenchyme (iLM) [5, 115]. KDR+ lateral plate mesoderm (also Foxf1+) was treated with various combinations of Wnt, Bmp4, PMA, and RA, and it was determined that RA and PMA were the minimal signaling requirements for derivation of lung-specific mesenchyme, based on the percentage of Tbx4-LERGFP+ cells [5]. The functionality of the engineered iLM was demonstrated by the following: (a) enhanced lung epithelial differentiation in organoid culture with PSC-derived lung progenitors, (b) induction of lung fate (Nkx2-1+ cells) in non-lung epithelial progenitors, and (c) maintenance of primary AT2 cells in organoid coculture. Tbx4-LERGFP+ iLM was also shown to be multipotential and was able to differentiate to smooth muscle cells, adipocytes, and mesenchymal alveolar niche cells (MANCs) [114].

5 Progenitor Cell Similarity Models

A major challenge facing the field is to develop tractable computational methods for assessing cell similarity between engineered and natural cells. An ideal method would assess similarity along multiple phenotypic axes: transcriptomic, epigenetic, metabolomic, etc. [28, 37]. However, such multimodal comparisons remain uncommon due to the difficulty and expense of generating the required datasets. For this reason, the vast majority of cell similarity methods restrict themselves to analyzing transcriptomic measurements made using RNA-seq methods. RNA-seq can occur in bulk, where a multitude of cells in a tissue are sequenced simultaneously, or in single-cell form (scRNA-seq), where individual cells are sequenced. The latter, although more expensive and resource intensive, provides granular resolution of the gene expression of individual cells and has become standard for identifying cell types. For this reason, here, we focus on scRNA-seq-based computational methods for understanding cell similarity (compared and reviewed in Abdelaal et al. [1] and Xie et al. [108]).

Any scRNA-seq-based method for quantifying cell similarity must account for several technical challenges [46]. Single-cell data tends to be extremely sparse, with many low-expressing genes being measured as zero expression (a phenomenon often called “dropout” in the computational literature). The noisy nature of scRNA-seq data is exacerbated by the large amount of cell-to-cell technical variation. These problems become especially acute when comparing data across experiments and biological conditions, where batch effects can dominate variation in scRNA-seq data, making it difficult to assess cell similarity, especially in the context of closely related cell types [83]. This presents a major technical hurdle for utilizing the comprehensive cell atlases now being generated by the community [69, 103].

To address the noisy nature of scRNA-seq, numerous computational methods have been proposed (many of which are collected in the now indispensable Bioconductor package) [7]. To deal with the missing zeroes and data sparsity in scRNA-seq datasets, analysis pipelines often include methods for data imputation (see Dai et al. [20] for a recent comparison of imputation methods). Although data imputation methods vary in their technical details (model-dependent or model-free deep learning methods), they all make use of the expression profiles of similar cells to impute missing values. This process tends to homogenize expression profiles across similar cells, making it difficult to assess small differences in gene expression profiles between cells. Most importantly, for our purposes, this homogenization can lead to inflated cell similarity scores, giving rise to an overly optimistic assessments of similarity between engineered and natural cell types. This general phenomenon is also true of commonly used batch correction methods, which decrease variation across conditions or experiments and thus can conflate technical variability with biological variability [69]. For this reason, great care must be taken when using imputation and batch correction methods in analysis pipelines whose goal is comparing cell similarity.

Perhaps the biggest challenge for assessing cell similarity is the high-dimensional nature of scRNA-seq data, with RNA from thousands of genes being measured simultaneously in individual cells. High dimensionality requires the use of specialized methods for visualization and interpretation. A typical data analysis pipeline [7] for discovering and classifying cell types involves applying a dimensional reduction algorithm to summarize the information provided by thousands of genes down to just two dimensions (e.g., using a dimensional reduction method such as SPRING [106] or uniform manifold approximation and projection (UMAP) [9], followed by clustering, and the utilization of marker genes to assign clusters to cell types. Many analysis pipelines also restrict the features being analyzed to a smaller subset, such as highly variable genes. Finally, to assess cell similarity of the new cell, the expression profile of the new cell is compared to expression profiles of the clusters that are used as a proxy for cell type.

A powerful feature of this pipeline is that it allows for easy visualization and outputs biologically interpretable clusters. This allows experimentalists to quickly assess experiments, identify biologically interesting features, and develop novel biological hypothesis that can be tested in new experiments. Given the widespread adoption, it is worth highlighting a number of limitations of this analysis pipeline for assessing cell similarity. First, both the feature selection and the dimensional reduction depend on the datasets that are included in the analysis. The inclusion or exclusion of different datasets generally yields different results even for the same experiment. This dependence on choice of data is especially important when trying to assess cell similarity between closely related cell types or comparing natural and engineered cells. Second, dimensional reduction techniques are primarily useful for visualization and almost always fundamentally distort biological patterns that occur in the full high-dimensional space [14]. In general, axes and distances in a UMAP or SPRING plot have no natural biological interpretation, making it challenging to rigorously interpret biological experiments.

To address these limitations, the authors have developed a novel computational method for computing cell similarity at the single-cell level (single-cell type order parameters, scTOP) [45, 111]. scTOP utilizes insights developed from statistical physics-inspired epigenetic landscape models to assess cell similarity in a holistic manner [21, 48, 60, 85]. The inputs to scTOP are a “cell basis,” consisting of the transcriptomic profiles of cell types of interest (e.g., the expression profiles of lung cells in a cell atlas, such as the single-cell lung cell atlas) and the scRNA-seq measurements of the cell being analyzed (e.g., an engineered cell whose cell similarity we wish to compute). The output of scTOP is a set of cell similarity scores that characterize the cell similarity with each cell type in the “cell basis” (e.g., a score to be a club cell, AT1 cell, AT2 cell, etc.). An appealing feature of the scTOP algorithm is that it allows for biologically meaningful visualizations and sensitive and accurate classification of cell similarity using scRNA-seq data [45]. Importantly, the method does not require any dimensional reduction techniques, feature selection, produces visualizations with meaningful axes, and does not depend on what other data is being analyzed. One limitation of scTOP is that it requires experimentalists to specify cell types of interest ahead of time through the choice of cellular basis. For this reason, scTOP cannot be used to assess or discover novel cell types. Nonetheless, we have found scTOP to be an excellent tool for comparing engineered and natural cells, where the cells of interest are often known [111].

Assessing cell similarity of progenitor populations is an especially computationally challenging task [108]. Transcriptome profiles of developmental progenitor cell types are generally more similar to each other than their adult counterparts, highlighting the need to collect scRNA-seq data on multiple progenitor populations. Furthermore, during the course of development, progenitor cell populations often rapidly change their transcriptome profiles as they differentiate, making it difficult to accurately measure transcriptomes in vivo.

This has important implications for thinking about lung progenitors at various developmental stages. Even when in vivo progenitors can be stabilized and expanded ex vivo, their transcriptomes can be altered due to radical differences in in vivo and ex vivo environments [42]. Hence, it is important to decide how to benchmark in vitro engineered cells, and characterization of progenitor-based organoids should extend beyond similarity scores to encompass functional differentiation, global chromatin accessibility, and proteomics.

6 Conclusion

As our understanding of lung multipotent progenitor biology deepens, the range of potential applications increases (Fig. 4.2). Ex vivo or engineered PSC-derived lung organoids have already been deployed for modeling of cell fate decisions and production of cells of clinical relevance. Tissue interactions and morphogenetic processes have been long-standing areas of interest in developmental biology [107], and one could expect modern-day organoids to be increasingly used to similar ends. Other future applications include respiratory disease modeling and the study of lung-microbiome interactions, thus reducing the need for animal experimentation. Ultimately, lung progenitor-based organoids will become widely and routinely used tools in medicine and developmental biology.

Fig. 4.2
An illustration has 3 sections. The fetal lung by dissection sorting and pluripotent stem cells by directed differentiation points to progenitor organoids which points to multimodal profiling. It further has cell fate decisions, infection models, and transplantation.

Lung progenitor-derived tractable organoids, current and future uses