Introduction

Understanding how the SARS-CoV-2 virus replicates and causes damage in the upper and lower airways has been the focus of intense scientific research for the past 4 years. The virus replicates in airway epithelial cells, and its receptor ACE2 is expressed throughout the airways and lung, but less so from the airway to the alveolar space1. Notably, replication of the Omicron variant appears to be confined to the upper airway tract, a finding that correlates with less severe disease2,3.

Removal and neutralization of potentially harmful substances from inhaled air are main functions of the airway epithelium, which consists of multiple cell types. (1) Basal cells are the progenitor cells that differentiate into the epithelial cell types. (2) Club cells secrete the immunomodulatory club cell secretory protein and also fulfill regenerative functions. (3) Goblet cells secrete mucins onto the internal surface of the respiratory tract, thereby forming a liquid layer (termed mucus) to protect the underlying epithelium. (4) Ciliated cells are located across the apical surface and facilitate the movement of mucus across the airway tract4. In the lungs, alveolar epithelial cells (ATI and II cells) line the alveoli where gas exchange takes place.

The viral spike protein has been found in the airway epithelium and lung tissue of deceased COVID-19 patients5. In ex vivo infections, ciliated cells were identified as natural targets of SARS-CoV-26. However, evidence of infection was also noted in secretory and basal cells, in both in vivo and ex vivo infections7. This includes viral budding in cells with secretory vesicles in infected human bronchial epithelial cells8; others found goblet cells infected in in vitro human airway epithelial cell cultures and, as a response to infection, showed increased mucus production9.

To recapitulate the complex cellularity of the airway epithelium and provide a physiological, yet workable model of the infection in humans, multiple stem-cell derived 3D organoid models have been applied to virology7,10,11,12. Organoids consists of multiple cell types grown in a 3D structure to mimic organ structure. They can be derived from embryonic stem cells (ESCs), pluripotent stem cells (iPSCs) or progenitor cells in adult tissues. SARS-CoV-2 replicates in iPSC- or ESC-derived human airway organoids (HAOs)13,14 and adult stem-cell HAOs that encompass both airway and alveolar cells15,16. In the latter model, donor lungs are digested and cultured in Matrigel and a medium that allows for the outgrowth of basal, club and goblet cells, which can then include ciliated cells with additional ex vivo differentiation steps4,17.

Differentiation, often performed at the air–liquid interface, produces a pseudostratified configuration that most closely resembles the in vivo situation, but is lengthy, requires large volumes of cells, and loses 3D structure. The undifferentiated organoids provide an intermediate model system that is more structured and heterogeneous than a cell line, yet less rigid than the fully differentiated model. These undifferentiated HAOs are tractable and easy to work with, while not being cancerously transformed and maintaining cellular diversity that resembles the airway epithelium. The undifferentiated organoid model has been used to study respiratory syncytial virus17 and SARS-CoV-26,18,19,20. However, infection rates in this model have been variable and donor-dependent18,19,20.

To address these challenges and opportunities for studying SARS-CoV-2 infection, we generated and explored an adult stem cell-derived undifferentiated HAO model. Our model was genetically modified with ACE2 overexpression to overcome obstacles with low infection rates and long and variable differentiation times. We studied the effects of infection in this system using a genetically diverse set of variants, and applied scRNA-seq to understand the cell-intrinsic response to SARS-CoV-2 in this primary cell model. We identified NFKBIA, a critical regulator of the NF-κB pathway, as a universal gene that marks highly infected cells across all viral variants, underscoring the recognized link between inflammation and SARS-CoV-2 infection that dominates acute respiratory distress syndrome and severe disease in the lung.

Results

HAOs were grown from digested uninfected donor lungs, according to a published protocol (Fig. 1A)17. As expected and confirmed by light sheet microscopy, the majority of cells (~ 80%) in one organoid expressed P63, the basal cell marker, but a minority (~ 20%) represented secretory cells, including MUC5AC+ goblet and CC10+ club cells, which aligns with published reports of undifferentiated HAOs (Fig. 1B)17. Due to the absence of cell differentiation, very few FOXJ1+ ciliated cells were observed (not shown). Compared to primary lung epithelium, markers for basal cells (P63, KRT5) were statistically overrepresented in quantitative RT-PCR analysis of organoids, and those for ciliated cells (FOXJ1), goblet cells (MUC5AC), and club cells (CC10) were underrepresented (Fig. 1C). Expression of ACE2 mRNA was also higher in primary lung tissue than the undifferentiated organoids, and no significant difference was noted in TMPRSS2 mRNA levels (Fig. 1D,E). No ACE2 protein was detected in the organoids by western blotting, which explains their poor infection efficiency in previous studies (Fig. 1F,G, Fig. S1)18.

Figure 1
figure 1

Characterization of human airway organoids (HAOs) and overexpression of ACE2. (A) Schematic detailing the establishment of organoid culture and ACE2 overexpression. (B) Light sheet microscopy of HAOs with staining for the cell markers CC10 (club cells) MUC5AC (goblet cells) and p63 (basal cells) with a brightfield image of a single organoid in the lower right-hand corner. (C) RT-qPCR comparison of cell-marker gene expression between digested primary lung tissue and organoids, (N = 3, organoids, N = 4, input, ± SD shown). (D) RT-qPCR comparing ACE2 expression level between organoids and digested lung tissue. (E) RT-qPCR comparing TMPRSS2 expression level between organoids and digested lung tissue (N = 3, organoids, N = 4, input, ± SD shown). (F) Representative western blot showing ACE2 protein levels in organoids at baseline and with ACE2 overexpression. (G) Quantification of ACE2 protein levels in wildtype organoids and organoids overexpressing ACE2. (H) Confocal imaging comparing ACE2 expression levels in wildtype and ACE2-overexpressing organoids. (I) RT-qPCR comparison of cell-marker genes after differentiation at the air–liquid interface of wildtype and ACE2 overexpressing organoids (N = 2, ± SD shown).

To overcome these issues, undifferentiated organoids were transduced with a lentiviral vector expressing the ACE2 open reading frame. ACE2-overexpressing organoids (ACE2-OE) were selected and maintained robust ACE2 levels over at least nine culture passages as shown by western blotting (Fig. 1F,G). This was also confirmed by confocal immunofluorescence microscopy where ACE2 protein expression was visible on the surface of most cells in the ACE2-OE organoids (Fig. 1H). Organoid differentiation at the air–liquid interface was unaffected by ACE2 overexpression, as differentiation to all four cell types was similar between wildtype (WT) and ACE2-OE organoids as assessed by RT-qPCR for epithelial cell markers (Fig. 1I).

Next, WT and ACE2-OE undifferentiated HAOs were infected with the SARS-CoV-2 WA1 ancestral strain at a multiplicity of infection (MOI) of 1. Culture supernatants were collected at 24 and 72 h and subjected to plaque assays. ACE2 overexpression resulted in approximately two log higher infectious particle production at 24 h and a log difference at 72 h (Fig. 2A–C, Fig. S2). As a second measure of active viral infection, infected organoids were subjected to confocal immunofluorescence microscopy after staining with an antibody against double-stranded (ds) RNA. This antibody specifically stains the RNA replication centers in infected cells containing positive–negative strand RNA hybrids21,22,23,24,25. dsRNA+ cells in ACE2-OE organoids had two- to threefold greater ACE2 expression than globally observed in WT organoids (Fig. 2D,E). These results show a robust increase in infected cell numbers with ACE2 overexpression that may now allow downstream analysis of SARS-CoV-2 infection at the single cell level.

Figure 2
figure 2

Infection of HAOs with SARS-CoV-2. Organoids were infected at an MOI of 1 for 2 h, and then washed and fresh medium added. Organoids were incubated for the length of time indicated (A) Plaque assay comparing wildtype and ACE2 overexpressing organoids at 24 h post-infection. (B) Plaque assay comparing wildtype and ACE2 overexpressing organoids at 72 h post-infection. (C) Representative images of plaque assays performed on supernatant from SARS-CoV-2 infection of wildtype and ACE2 overexpressing organoids. (D) Representative images of infected HAOs at 24 and 72 h stained for dsRNA (green). (E) Quantification of percent of total DAPI+ cells expression dsRNA across three experiments at 24 and 72 h.

To test this, WT and ACE2-OE, organoids were subjected to 10× scRNA-seq after infection with the SARS-CoV-2 WA1 strain. Raw sequencing data were aligned to the human genome with the SARS-CoV-2 genome appended, run through scVI26,27, and represented in a joint (batch-corrected) low dimensional space (see “Methods”). In that space, the WT infected (WT_I), ACE2 uninfected (ACE2_U) and WT uninfected (WT_U) conditions clustered together, and a large subset of the ACE2 infected (ACE2_I) cells clustered separately (Fig. 3A,B, Fig. S4).

Figure 3
figure 3

Single-cell RNA-sequencing of WT and ACE2 OE organoids infected with SARS-CoV-2. (A) UMAP colored by overexpression condition of organoids. (B) UMAP colored by overexpression condition and infection condition. (C) UMAP colored by log expression of SARS-CoV-2 RNA. (D) UMAP colored by GMM infection classification. (E) UMAP colored by cell type identification. (F) Dot plot showing levels of interferon genes and IL-6 RNA grouped by infection level. (G) Dot plot showing levels of interferon genes and IL-6 grouped by overexpression condition. (H) Dot plot showing levels of top differentially expressed ISGs grouped by infection level. ISGs were selected from the list of top 50 differentially expressed genes between the high-infected and uninfected cells. (I) Dot plot showing levels of top differentially expressed ISGs grouped by overexpression background.

To further validate the presence of cell types in the organoids identified by microscopy, the Lung Cell Atlas28 was used as a reference to assign cell types to our scRNA-seq data. In agreement with our previous results (Fig. 1), most cells were basal cells, and the remainder were respiratory goblet cells- a secretory cell type present in the lungs (Fig. 3E). Only a small number were identified as ciliated cells (Fig. 3E). Furthermore, this annotation revealed no significant difference in proportion of each cell type when ACE2 and WT organoids were compared, indicating that ACE2 overexpression does not change the cellular composition of organoids (Supp. Fig. 4A).

To further confirm that overexpression of ACE2 did not change the biology of the organoids, a differential gene expression analysis was conducted between WT_U and ACE2_U samples (Supp. Fig. 4B). ACE2 was identified as the highest differentially expressed gene, and only four other genes (C20orf85, BPIFA1, TMEM190, UTP14C) were differentially expressed with a q-value less than 0.01. Considering SARS-CoV-2 RNA, we found expression primarily in cells from the ACE2_I condition (Fig. 3C, Supp. Fig. 4C–F). These results are consistent with our previous data (Fig. 2), showing that SARS-CoV-2 infection levels are higher in the ACE2 condition than the WT. They further suggest that the difference in infection levels between the organoids is most likely due to ACE2 overexpression and not altered abundance of other receptors or cofactors.

Considering only cells from the infected conditions (ACE2_I, WT_I), we observed a broad cell-to-cell variation in the abundance of viral products, presumably reflecting different levels of infection. To distinguish cells that had productive SARS-CoV-2 infections from those that had little to no infection, we fitted a Gaussian Mixture Model (GMM) to the overall amount of viral product in each cell, so as to classify cells from the infected samples into high, low and no infection groups (accounting for possible ambient viral product; see “Methods” and Fig. 3D). As expected, most of successfully infected cells (low and high infection groups in our model) came from the ACE2_I condition with 8.24% and 15.01% vs 1.47% and 0.04%, for high and low, ACE2-OE vs WT respectively. Conversely, a subset of ACE2_I cells from the “no infection” group of cells clustered together with cells from the uninfected samples (ACE2_U, WT_U), suggesting that some cells in the infected organoid were not only uninfected but also not influenced by the infection of other cells.

To determine whether the successfully infected cells were detecting the viral replication, we assessed their levels of interferon gene expression. We found no expression of IFN-α, -β, or -γ in our scRNA-seq data. This result is similar to data published from COVID-19 patient samples in which IFN-λ is the primary response29. Interestingly, while we found IFN-λ expression only in infected samples, the highest amounts were observed in cells with low levels of infection, with substantially lower expression in cells from the no-infection or high-infection groups (Fig. 3F,G). While these data help to demonstrate that these organoids recapitulate the response of lungs during infection, they also suggest that the anti-viral IFN-λ response is only active in cells with low-level infections.

Unlike the ACE2_I sample, we found comparably little expression of interferon genes in the WT_I sample (Fig. 3G). To further interrogate this, we looked at the levels of interferon-stimulated genes (ISGs) in the different groups of cells. Consistently, we found that ISGs were primarily expressed in ACE2_I cells with low levels of viral RNA and not in the other infection groups or in the WT cells (Fig. 3H,I). Finally, for an unbiased analysis, we examined the differential expression in infected and uninfected samples in each of our genetic backgrounds (ACE2 OE and WT). Using gene set enrichment analysis, we found strong enrichment for interferon and anti-viral response program in the ACE2-overexpressing background, and much less so when comparing the WT samples (Supp. Fig. 5A,B). Our results therefore suggest that the lack of viral replication in WT cells is not due to an immune response, but more likely a result of limited viral entry and replication.

Next, we compared transcriptome responses at the single-cell level in a second set of ACE2-OE organoids, either uninfected or infected with various viral variants. Besides WA1 (using similar conditions as above), we tested the Alpha variant (B.1.1.7), two separate isolates of the Beta variant (B.1.351), and the California-resident Epsilon variant (B.1.429). Irrespective of cell type, the cells from the infected organoids were well mixed, and most uninfected cells clustered separately (Fig. 4A), indicating no clear difference or outlier among the infections (Fig. 4B). Basal cells remained the overwhelming cell type in this experiment, with a smaller representation of goblet cells (Fig. 4C). We again used the GMM approach for classifying the level of infection within each cell (Fig. 4D). As in Fig. 3, we saw activation of INF-λ and ISGs across all samples infected by the variants but not in the uninfected sample (Fig. 4E,F).

Figure 4
figure 4

Single-cell RNA-sequencing of airway organoids infected with SARS-CoV-2 variants. (A) UMAP showing distribution of cells from each variant infection. (B) UMAP colored by infected and uninfected conditions. (C) UMAP colored by cell type (D) UMAP colored by SARS-CoV-2 infection level (via GMM classification). (E) Dot plot showing levels of interferon genes and IL-6 grouped by variant. (F) Dot plot showing levels of ISG expression grouped by variant. (G) Heatmap showing top five genes correlated to SARS-CoV-2 RNA level grouped by variant. The number in the box shows the ranking of the gene based on correlation for the variant and coloring shows the mean expression of the gene listed on the left. Heatmap was made using Python package Matplotlib version 3.5.2 (https://matplotlib.org/stable/). (H) UMAP colored by log expression of NFKBIA RNA. (I) UMAP colored by log expression of SARS-CoV-2 RNA. (J) Infection level proportions per variant.

In addition, all variants had similar infection levels with one of the Beta strains having a slightly higher level (Fig. 4J). However, a comparison of the replication dynamics over time of Beta B, Beta A and WA-1 showed no significant differences, suggesting that the higher levels of virus are more likely due to the variability of the assay and not to an intrinsic difference in these viruses (Supp. Fig. 6). These results show the functionality of the organoid system across viral variants.

To determine which genes were most consistently upregulated with infection across the variants, we computed the correlation between gene expression and viral product (over cells) for each variant. The genes were then ranked by their level of correlation, separately for each variant (Fig. 4G; Table S1). Interestingly, we found a high level of consistency among the resulting lists of top ranked genes, with CXCL3 being the top ranked in four out of the five variants, TNFAIP3 in the top three in four of the variants, and NFKBIA in the top five in all variants. NFKBIA had the highest expression of the top-correlated genes, and had high expression in the high infection cluster (Fig. 4H). This cluster also had the highest levels of SARS-COV-2 viral products (Fg. 4I). NFKBIA is a quintessential NF-κB response gene and encodes the IκBα protein, which in turn dampens NF-κB activity by preventing its translocation into the nucleus30. This usually generates a negative feedback loop where NF-κB activity is terminated or temporarily restrained by the upregulation of IκBα. The fact that NFKBIA is the top-upregulated gene in the analysis indicates strong activation of the NF-κB signaling pathway in infected cells across variants.

To investigate this finding in greater depth, we confirmed upregulation of the IκBα protein by confocal microscopy in A549 epithelial cells overexpressing ACE2 as they form a homogenous monolayer on the cover slips. Cells were infected with SARS-CoV-2 WA-1 and co-stained for dsRNA and IκBα to differentiate between infected and uninfected cells. Cells were imaged, and the mean fluorescence intensities (MFI) of IκBα were compared in dsRNA+ and dsRNA cells. IκBα protein expression was enhanced in infected (dsRNA+) versus noninfected bystander (dsRNA) cells, mirroring the induction of NFKBIA transcript levels in the scRNA-Seq analysis (Fig. 5A). In a parallel experiment, we co-stained infected A549-ACE2 cells with antibodies against dsRNA and the p65/RELA subunit of NF-κB and quantified p65 nuclear localization. 75% of dsRNA+ cells showed p65 nuclear localization, but only a few dsRNA cells did (Fig. 5B,C). This confirms a strong activation of the NF-κB signaling pathway only in infected, and not in uninfected, cells, which explains the consequent upregulation of NFKBIA as a response gene. However, the resultant increase in IκBα activity appears insufficient to fully downregulate NF-κB signaling in cells infected with SARS-CoV-2, as the majority of infected cells localize p65 in the cell nucleus.

Figure 5
figure 5

Role of IκBα in SARS-CoV-2 infection. (A) Quantification of IκBα protein levels measured by fluorescent microscopy in SARS-CoV-2 infected dsRNA+ and dsRNA A549-ACE2 cells. (B) Quantification of p65 translocation in SARS-CoV-2 infected dsRNA+ and dsRNA A549-ACE2 cells. (C) Representative image of p65 (red) translocation in infected-dsRNA+ (green) A549-ACE2 cells. (D) Quantification of p65 translocation in mutant IκBα overexpressing, empty vector expressing, and untransduced A549-ACE2 cells stimulated with TNFα. (E) Representative image of IκBα (HA) and p65 staining in TNFα stimulated mutant IκBα and untransduced A549-ACE2 cells. (F) Quantification of percent infection by dsRNA+ staining in mutant IκBα overexpressing, empty vector expressing and untransduced A549-ACE2 cells. Three replicates were done, and a minimum of 5000 cells were analyzed per condition; standard deviation is indicated.

The IκBα protein can be phosphorylated by upstream kinases in a process that destabilizes the inhibitor and as a consequence facilitates p65 nuclear translocation and transcriptional activity31. To further strengthen IκBα function in the infectious process, we stably overexpressed an IκBα construct in A549-ACE2 cells with the serine residues at positions 32 and 36 mutated to alanine to prevent phosphorylation32. This generates a “super-inhibitor” that is unresponsive to cell signals and is highly active in retaining p65 in the cytoplasm, preventing its transcriptional activity in the nucleus32. First, we confirmed the proper function of the mutated inhibitor by stimulating A549-ACE2 cells transfected with the inhibitor with TNFα, normally leading to phosphorylation and degradation of IκBα and consequent nuclear NF-κB translocation33. In nontransfected cells, 100% of cells had nuclear translocation of p65 in response to TNFα (Fig. 5D,E). However, in cells expressing nondegradable IκBα, only 25% of cells showed p65 nuclear translocation, confirming that the super-inhibitor was active, keeping p65 in the cytoplasm. Co-staining of p65 and the HA-tagged IκBα mutant revealed that the remaining p65 translocation occurred in cells with low expression of the IκBα mutant (Fig. 5E, Fig. S7).

When cells expressing the IκBα mutant were infected with SARS-CoV-2 WA1, infected cultures showed two-fold more dsRNA+ cells than the parent cell line, supporting the model that host IκBα promotes viral infection by restraining the NF-κB response (Fig. 5F).

Discussion

The organoid model presented here adds another tool to the studies of SARS-CoV-2 infection in primary airway cells. It overcomes the hurdle of donor-dependent variable infection rates18 and may present a tractable intermediate between cell line studies and fully differentiated primary cell models. We acknowledge that ACE2-OE may have more rapid virally-induced cell death, a potential limitation to this study. However, the organoids are able to restrict viral infection over the 72 h timepoint, suggesting that despite higher infection in the ACE2-OE HAOs, the cells are not intrinsically overwhelmed by virus (Fig. 2A–B). Our data showing that ACE2-OE organoids recapitulate transcriptional changes seen in vivo, such as the induction of IFN-λ, also support the validity of the model.

We found infection in three cell types: stem-like basal cells and secretory goblet and club cells, with high viral replication associated with infection level. Transcriptional profiling in all three cell types yielded common transcriptional changes as indicated by clustering according to infection status by different viral variants. NFKBIA stood out as a gene of interest in controlling SARS-CoV-2 infection across variants and a highly induced gene whose mRNA levels positively correlated with viral RNA levels.

These data are consistent with published reports showing high levels of expression of NF-κB signaling-related genes in SARS-CoV-2-infected cells34. The overexpression of a mutant IκBα protein that cannot be phosphorylated and consequently degraded and the resulting five-fold increase in dsRNA+ cells support this model and identify IκBα as a bona fide proviral factor. This is corroborated by studies by Sunshine et al., who identified NFKBIA in a Perturb-Seq study as a host protein supporting viral replication35.

It remains to be determined what the relevant upstream and downstream signals of this proviral phenotype are and whether the continuous presence of NF-κB in the cell nucleus provides benefits for the SARS-CoV-2 lifecycle. Our findings indicate that after overexpression of the mutant IκBα protein, nuclear p65 levels were lower in infected cells, supporting a model in which IκBα promotes viral infection by constraining, albeit not completely, the well-known antiviral activities of NF-κB. However, knockdown studies of p65 have pointed to a positive role of the factor in SARS-CoV-2 infection that needs further mechanistic exploration36. The fact that nuclear p65 is a striking differentiator of infected from bystander cells in our confocal microscopy studies underscores a possible role of nuclear p65 in SARS-CoV-2 infection.

Materials and methods

Cell lines

A549 cells were cultured in complete DMEM (DMEM with 10% FBS, 1% penicillin–streptomycin, 1% glutamine). A549-ACE2 cells were made as described37 and cultured in complete DMEM supplemented with 10 μg/mL blasticidin. Vero-TMPRSS2 cells were a gift of Dr. Raul Andino and A549 cells were a gift of Dr. Andreas Puschnik.

A549-ACE2- IκBα (SS32,36AA) were transduced with a lentiviral vector expressing IκBα (SS32,36AA) and selected in complete DMEM with 2 μg/mL of puromycin and 10 μg/mL of blasticidin. pLVX-EF1α- IκBα (SS32,36AA)-IRES-Puro was cloned by PCR amplification of HA-tagged IκBα (SS32,36AA) from pCMV4-3 HA/IκBα (SS32,36AA) and Gibson assembly, PCMV4-3 HA/IκBα (SS32,36AA) was a gift from Warner Greene (Addgene plasmid #24143) and pLVX-EF1α-IRES-Puro was purchased from Clontech Laboratories (plasmid #631988).

293T-HA-R-Spondin1-Fc cells were purchased from Trevigen (Catalogue Number 3710-001-K) and cultured according to the manufacturer’s protocol to generate conditioned medium of R-spondi-1. Briefly, cells were grown in selection growth media (DMEM with 10% FBS, 1% penicillin–streptomycin, 1% glutamine, and 100 mg/mL Zeocin) for > 5 days until they were > 90% confluent. Medium was replaced with organoid basal media (Advanced DMEM/F12 from Invitrogen supplemented with 1% penicillin–streptomycin, 1% Glutamax, and 10 mM HEPES). After 3 days, cell supernatant (i.e., R-spondin-1 conditioned medium) was collected, centrifuged at 3000×g for 15 min, filtered through a 0.22-µm filter, and frozen at − 20 °C in 10-mL aliquots. This process was repeated by adding fresh organoid basal medium to the cells and collecting supernatant after 4 days.

Culture of human airway organoids

To generate HAOs, input cells were used from whole-lung lavages or from digestion of whole lungs. HAOs were generated from these cells as described17,38. Briefly, single cells were suspended in 65% reduced growth factor BME2 (Basement Membrane Extract, Type 2, Trevigen, catalogue number 3533-001-02). From this mixture, 50-µL drops containing 1000–40,000 cells were seeded in 24-well suspension culture plates (GreinerBio-one, catalogue number 662-102). Drops were incubated at 37 °C for > 20 min and solidified. After this, 500 µL of HAO medium was added to each well. HAO medium is organoid basal media (Advanced DMEM/F12 supplemented with 1% penicillin–streptomycin, 1% Glutamax, and 10 mM HEPES) supplemented with 10% (vol/vol) R-spondin1 conditioned medium, 1% B27 (Gibco), 25 ng/mL Noggin (Peprotech), 1.25 mM N-acetylcysteine (Sigma-Aldrich), 10 mM nicotinamide (Sigma-Aldrich), 5 nM Herefulin beta-1 (Peprotech), and 100 µg/mL of Primocin (InvivoGen). HAO medium was further supplemented with 5 µM Y-27632, 500 nM A83-01, 500 nM SB202190, 25 ng/mL of FGF-7, and 100 ng/mL of FGF-10 (all from Stem Cell Technologies), and HAO medium was replaced every 3–4 days.

After 14–21 days, organoids were passaged. For this, cold basal medium was used to collect organoids in 15-mL Falcon tubes and dissolved in BME2. Tubes were centrifuged at 250×g for 5 min at 4 °C. Medium was aspirated, 10× TrypLE Select (Gibco) was added to the organoids, and the mixture was incubated at 37 °C for 5–10 min. Organoids were further dissociated by pipette mixing and then diluted in cold basal medium. After another spin and medium aspiration, cells were mixed with BME2 and seeded into new drops. After this initial passage, organoids were passaged every 10–21 days. Stocks of early-passage (P1–P3) organoid lines were prepared by dissociating organoids, mixing them with recovery cell culture-freezing medium (Gibco), and freezing them by standard procedures. These samples could be thawed and immediately cultured in HAO medium.

Creation of HAO-ACE2 lines

HAO-ACE2 lines stably expressing hACE2 were generated by lentiviral transduction with plasmid LV-ACE239, and selected with 2 µg/mL of blasticidin, as described in a study from our group40.

Differentiation of HAOs at the air–liquid interface

HAOs were cultured at the air–liquid interface (ALI) for differentiation. Organoids were collected in cold basal medium and dissociated as described above. After spin and medium aspiration, cells were resuspended in warm basal medium with 2% (v/v) BME2 and seeded onto pre-coated 2% BME2 in 6.5-mm insert of a 24-well plate Transwell Permeable Support (Costar). Cells were incubated for 1 h at 37 °C and then 500 µL of warm HAO medium was added to the bottom wells of the 24-well plate. Cells were grown to confluency for 1 week before the ALI was established by removal of apical side medium and substitution of basolateral medium for 1:1 (v/v) HAO medium and with Pneumacult ALI basal medium (Stem Cell Technologies, 05002). Basolateral medium was changed every 3–4 days, and mucus was washed from the apical side Cells were differentiated at the ALI for at least 1 month.

Real-time quantitative PCR

The Qiagen RNEasy and Zymo DirectZol RNA MiniPrep were used for RNA extraction and isolation according to manufacturer’s instructions. Final RNA concentrations were measured with a NanoDrop ND-1000. Total RNA was reverse-transcribed using oligo(dT)18 primers (Thermo Scientific), random hexamers primers (Thermo Scientific), and AMV reverse-transcriptase (Promega). cDNA was diluted to 5 ng/µL. Gene expression was assayed by real-time quantitative PCR using Maxima SYBR Green qPCR Master Mix (Thermo Scientific) on a Biorad C1000 real-time PCR system. The SYBR Green qPCR reactions contained 10 μL of 2× SYBR Green Master Mix, 2 μL of diluted cDNA, and 8 pmol of both forward and reverse primers. The reactions were run using the following conditions: 50 °C for 2 min, 95 °C for 10 min, followed by 40 cycles of 95 °C for 5 s and 60 °C for 30 s. Relative values for each transcript were normalized to 18S rRNA. Gene primers used are listed in Table S2. For every qPCR run, three technical replicates per sample were used for each gene.

Western blots

Organoids were collected from culture plate with cold basal medium, washed, and centrifuged. The resulting cell pellet was re-suspended in 100 μL of RIPA buffer with HALT protease. Sample was incubated on ice for 30 min, before sonicating at 40% power for 10 s (Sonic Dismembrator Model 500). Sonicated cells were spun down, and supernatants were collected as cell lysates.

Cell-lysate protein concentrations were determined by Bio-Rad DC Assay. For each sample, 16–20 μg of protein was loaded. Appropriate volume of cell lysate was lysed with Tris–glycine SDS sample buffer, heated at 95 °C for 5 min. Lysate samples were immediately loaded onto 4–20% gradient gels (Bio-Rad, catalogue # 4561095) for SDS-PAGE for 90 min at 120 V. Gel was wet-transferred onto a 0.45 μm nitrocellulose membrane for 1 h at 100 V. Blots were blocked in 5% milk in 1× TBST solution for 1 h in room temperature with gentle rocking, then incubated with primary antibodies in 5% milk in 1× TBST overnight at 4 °C with gentle rocking. Blot was washed three times with 1× TBST and incubated with goat anti-rabbit and goat anti-mouse IgG HRP-conjugated secondary antibodies diluted at 1:5000 with 5% milk in 1× TBST for 1 h at room temperature with gentle shaking. After washing the blots three times with 1× TBST, blots were developed with Lumi-Light Western Blotting Substrate (Roche) for 5 min in the dark and visualized on a chemiluminescence imager (Bio-Rad ChemiDoc MP).

Whole-mount organoid staining

Organoids were processed for imaging as described41. Briefly, organoids were removed from BME2 with 3× cold PBS washes then fixed in 2–3% paraformaldehyde for 30 min on ice and washed 3× in PBS. Fixed organoid samples were stored at 4 °C for up to 2 months.

For staining, organoids were blocked in PBS supplemented with 0.5% Triton X-100, 1% DMSO, 1% BSA, and 1% donkey or goat serum. Organoids were blocked for several hours at room temperature. Blocking solution was removed and replaced with blocking solution containing primary antibodies diluted 1:500 or 1:250. Organoids were incubated with primary antibodies for 24 h at 4 °C. After this, organoids were washed 3× in PBS and incubated with secondary antibodies diluted 1:250 in PBS at room temperature for several hours. Organoids were washed 3× in PBS and stained with Hoescht before visualization. Organoids were imaged on both Zeiss Axio Observer Z.1 and Zeiss Lightsheet Z.1 microscopes. Images were processed using a combination of the Zeiss software, ImageJ 1.51f, and Imaris 9.3. Primary antibodies we used are listed in Table S3.

Confocal imaging of HAOs

HAOs were dissociated as described above. Organoids were resuspended in warm HAO medium with 2% BME2 and seeded onto glass microscopy chamber slides (Thermo Fisher, 154453) pre-coated with 1% (v/v) Geltrex (Gibco, A1413301). Organoids were allowed to embed in the Geltrex layer before aspirating the media and fixed in 4% paraformaldehyde for 30 min on ice and washed 3× in PBS.

For staining, organoids were permeabilized in 1% Triton X-100 for 10 min at room temperature. Organoids were then blocked in 100 µL of blocking solution (5% donkey or goat serum, 1% BSA, 0.1% cold fish skin gelatin, 0.1% Triton X-100, 0.05% Tween 20) for 2 h with gentle rocking at room temperature. Organoids were then incubated in primary antibodies diluted in antibody solution (3% serum, 0.1% cold fish skin gelatin, 0.3% Triton X-100, 0.05% Tween 20) overnight at 4 °C with gentle rocking. After, organoids were washed three times in 1× PBS with 0.1% BSA, and incubated with secondary antibodies conjugated with fluorophores, diluted 1:400 in antibody solution, for 2 h at room temperature with gentle rocking while protected from light. Organoids were washed three times in 1× PBS with 0.1% BSA and stained with Hoescht33342 for 10 min at room temperature. Polyester gasket was removed from the glass slide, and rectangular glass slide was mounted onto the HAOs with Gold antifade reagent (Invitrogen, P36934).

Images were taken on the Olympus FV3000RS confocal microscope and processed with Imaris 9.3.

Infection of HAOs with SARS-CoV-2

HAOs were dissociated and re-suspended to final density of 100,000 cells per 300 μL of HAO medium and seeded onto culture plates pre-coated with BME2 at 37 °C for 1 h. The cells were allowed to settle and embed onto BME2 coating for 20–60 min at 37 °C before adding HAO medium. 24 h later, the organoids were infected at an MOI of 1 for 2 h, and then then the virus-containing medium was removed, the cells were washed, fresh medium was added, and the cells were incubated until post-infection processing.

Propagation of SARS-CoV-2 variants

All SARS-CoV-2 variants were propagated on Vero-E6 cells expressing human TMPRSS2 in an aerosol biosafety level-3 lab (aBSL3) lab. Stocks were titered via plaque assay on Vero-E6 TMPRSS2 cells as described25 and sequenced to confirm no novel cell-culture mutations.

Single-cell RNA sequencing

HAOs were infected with SARS-CoV-2 as described above for 24 h. Organoids were removed from the plate by dissolving the geltrax with cold basal medium and spun down. A single-cell suspension was made by digesting the organoids with 10× TrypLE for 15 min and passing them through a 40-μM filter. Library prep was done according to the standard 10× protocol with 6000 cells loaded into the 10× Chromium Controller.

Libraries were sequenced on a NovaSeq6000, and the RNA was aligned with Cell Ranger v.6.0.0 to the human GRCh38 reference (10× Genomics, STAR aligner with the complete SARS-CoV-2 genome (MN985325.1) appended42.

Analysis of SARS-CoV-2 sequencing

After alignment, cells that had > 20% Unique Molecular Identifiers (UMIs) coming from mitochondrial genes were removed. Cells were also removed if they had fewer than 200 UMIs or greater than 300,000 UMIs. After filtering, the single-cell RNA-seq dataset contained 38,462 cells and 36,611 genes.

scRNA-seq analysis with scVI

We used the Seurat v3 method43 as implemented in Scanpy v.1.8.144 to select the top 4000 highly variable genes from the dataset, excluding the SARS-CoV-2 genes and ACE2. Then we ran scVI26 as implemented in scvi-tools v.0.13.027 with default parameters and with each sequencing round treated a batch covariate. This resulted in a single latent space for all the data. We visualized the data by running the Scanpy function scanpy.pp.neighbor, followed by scanpy.tl.umap for each round of sequencing independently on the scVI latent space.

Cell-type annotation

We first used seven different automated cell-type annotation methods to get a consensus prediction using the Lung Cell Atlas as a reference dataset28. The seven methods were: (1) OnClass45, (2) SVM46, (3) Random Forest (as implemented in sklearn)47, (4) KNN after batch correction with BBKNN48, (5) KNN after batch correction with Scanorama49, (6) KNN trained on the Lung Cell Atlas28 scVI latent space26, (7) scANVI50. The resulting prediction was the majority prediction over the seven methods. We also derived a consensus score, based on the number of methods that agreed with the consensus prediction. After manual inspection, we determined that the only cell types present in the dataset were either: “Basal Cells”, “Goblet Cells”, and “Lung Ciliated Cells”. For all cells with a consensus score less than 5 or were not predicted as a basal/goblet/ciliated cell, we reclassified with a KNN trained on cells with consensus score greater than 5 and predicted as a basal/goblet/ciliated cell.

Correlation analysis

To identify the top genes correlated with increased SCV2 levels, we computed the Pearson correlation as implemented in SciPy v1.4.1 (scipy.stats.pearsonr). We calculated the correlation for each gene in each sample individually between the log-normalized gene counts as implemented in Scanpy44 v.1.8.1 (scanpy.pp.normalize_total(adata, target_sum = 1e4) followed by sc.pp.log1p) with the sum of all SCV2 RNA transcripts per sample. Then for each sample, we filtered for genes expressed in at least 20% of cells in the sample.

DE testing and GO analysis

We used DESeq2 (v1.34.0) to identify genes differentially expressed between WT HAOs and ACE2-overexpressed HAOs by pseudobulking the transcript counts of each sample51. We used Scanpy v.1.8.1 to identify differentially expressed genes between cells in the infected vs uninfected conditions44. Metascape (v 3.5) with default Express Analysis settings were used to identify enriched Gene Ontology Terms52.

Classifying infection with Gaussian mixture models

To classify whether a cell was actually infected with SCV2 (not just exposed to the virus), we used a Gaussian Mixture Model (GMM) to classify whether the SCV2 mRNA UMIs were due to background viral mRNA or from replicating virus within a cell. We used the scikit-learn v.1.0.1 GMM implementation (sklearn.mixture.GaussianMixture) with default parameters. Bayesian information criterion (BIC)53 as implemented in scikit-learn was used for model selection; defined as BIC = − 2log(L) + log(N)d, where L is the maximum likelihood of the GMM, N is the number of samples, and d is the degrees of freedom. For each experimental condition, we tested the optimal GMM for two vs three components as well as whether a single model should be trained on the log (sum of viral mRNA per cell + 1) or a separate model for each individual log(viral mRNA counts + 1) per viral gene. When training a separate model individually, we trained a GMM for each viral gene, summed the posterior probability of each component for each model, and for each cell assigned the component with the greatest total posterior probability. The BIC was then calculated as the mean BIC for each individual model. For each experimental condition, we selected the model with the lowest BIC. In all the uninfected experimental conditions (cells not exposed to virus), the optimal GMM model was a two-component GMM trained on the log (sum of viral mRNA per cell + 1), while the optimal model for all the infected experimental conditions (cells exposed to virus), was a three-component GMM trained on the log (viral mRNA counts + 1) for each viral gene. For the two-component GMM, we interpreted the component with a lower mean as cells with “No Infection” while the higher component was “Yes Infection”. It is worth noting that all cells in the uninfected condition were classified as “No Infection,” which tracks with it being a negative control. For the three-component GMM, we interpreted the component with the lowest mean as “No Infection”, the middle mean as “Low Infection”, and the highest mean as “High Infection”.