Introduction

Hematopoietic stem and progenitor cells (HSPCs) in vertebrates sustain the lifelong hematopoiesis through self-renewal and multilineage differentiation.1 Clinically, these cells are the essential component of bone marrow transplants, which are widely used to treat patients with leukemia and congenital blood disorders. Recent studies have demonstrated that HSPCs are heterogeneous, manifested as distinct molecular features and functional properties, at the single-cell level.2,3,4 HSPC heterogeneity is speculated to be attributed to both cell-extrinsic microenvironmental cues and cell-intrinsic regulatory factors.2 A better understanding of the specification of heterogeneous HSPCs can provide new insights for developing more efficient cell replacement-based therapies and also HSPC-based gene therapies to treat hematologic diseases.

HSPCs are specified during embryogenesis by trans-differentiation of endothelial cells (ECs) in the ventral wall of dorsal aorta (VDA) of aorta-gonad-mesonephros (AGM) region. This process is known as endothelial-to-hematopoietic transition (EHT). During EHT, a subset of ECs in the aortic floor develop into hemogenic endothelial cells (HECs) and further produce HSPCs. Evidence from a growing number of studies suggests that the nascent HSPCs are heterogeneous.5,6,7 For example, genetic tracing in mice reveals the embryonic-born hematopoietic stem cells (HSCs) and progenitors with distinct differentiation potentials and lifelong contributions.8,9 Furthermore, genetically manipulation of Evi1 expression and microRNA-223-mediated N-glycan biosynthesis regulate heterogeneous HSPC fate determination at birth.7,9 Given that embryonic HSPCs are derived from HECs, there exists a possibility that HSPC heterogeneity is pre-determined at the HEC stage. Using lineage tracing analysis, HECs in the AGM and posterior blood island of zebrafish embryos have been shown to possess different hematopoietic lineage outcomes.10 Similarly, clonal assays and transcriptomics analysis in mouse embryos also show that HECs distinguished by different immunophenotypes and transcriptional features can generate heterogeneous stem/progenitor cell subtypes.11 Thus, these reports implicate that HSPC heterogeneity is likely attributed to HEC heterogeneity. However, a comprehensive understanding of the exact origin, molecular features and regulatory mechanisms of heterogeneous HSPCs during embryogenesis remains incomplete.

Given that zebrafish possesses unique advantages of in vivo imaging and genetic manipulation, it becomes a powerful tool to dissect the origin and molecular mechanism of heterogeneous HSPC generation. Here, we establish a single-cell transcriptomic and chromatin accessibility landscape of EHT in zebrafish embryos. Integrative single-cell multi-omics analysis defines the exact HEC and HSPC subtypes based on transcriptional and epigenetic heterogeneities. Furthermore, lineage tracing and single-cell transcriptome analysis of spi2+ lineages visualize the generation and embryonic contribution of lymphoid/myeloid-biased HSPCs (L/M-HSPCs) and reveal that lymphoid/myeloid priming of HSPCs can be determined at as early as HEC stage. Mechanistically, Spi2 controls L/M-HSPC fate acquisition by directly activating the transcription of master hematopoietic regulators while repressing the endothelial program in HECs. Collectively, our study suggests that lineage competence is endowed earlier during embryonic development than was previously appreciated and identifies a new molecular determinant for lineage-biased HSPC generation across vertebrates.

Results

Single-cell transcriptomics and epigenomics analyses of HEC/HSPC heterogeneity and the regulatory programs

To generate a comprehensive transcriptomic and chromatin accessibility map of EHT at single-cell level, we performed single-cell transcriptomics (scRNA-seq) and single-cell chromatin accessibility (scATAC-seq) analyses of endothelial and hematopoietic components from zebrafish embryos (Fig. 1a). We employed the transgenic line (Tg) (kdrl:mCherry/runx1:enGFP)12 to enrich ECs (kdrl+runx1), HECs (kdrl+runx1+) and hematopoietic cells (kdrlrunx1+) in the AGM region at 36 hours post fertilization (hpf) (Fig. 1a; Supplementary information, Fig. S1a, b), which is the time point of HEC specification and HSPC emergence.12,13,14,15 For the scRNA-seq analysis, after filtering low-quality cells, we obtained a total of 4583 cells, with a median of 2065 genes detected per cell from scRNA-seq data (Supplementary information, Fig. S1c). Based on the expression of specific marker genes, these cells were annotated into 17 different cell types (Supplementary information, Fig. S1d–f and Tables S1, S2). We next established an epigenetic map of EHT by measuring chromatin accessibility across 3908 single cells and obtained ten distinct clusters according to the gene body accessibility of cell type-specific genes (Supplementary information, Fig. S2a–c and Tables S1, S2). Altogether, these scRNA-seq and scATAC-seq datasets provide comprehensive cell type-specific transcriptional and chromatin accessibility landscapes of embryonic HSPC generation in zebrafish.

Fig. 1: Single-cell transcriptomic and epigenetic analysis of HEC/HSPC heterogeneity.
figure 1

a Schematic representation of the strategies used for cell preparation and scRNA-seq and scATAC-seq of zebrafish kdrl+runx1 cells, kdrl+runx1+ cells and kdrlrunx1+ cells in AGM region at 36 hpf. b UMAP plots showing endothelial and hematopoietic populations based on scRNA-seq and scATAC-seq. c Dot plots showing the average and percentage expression of representative endothelial and hematopoietic signature genes in the VEC, AEC, HEC, L-HSPC, M-HSPC and E-HSPC. d TF overlay on single-cell ATAC UMAP of TF chromVAR deviations (top) and gene-activity scores (middle), and single-cell RNA expression level (bottom) for etv2, runx1, cmyb, gata1 and spi1b. e UMAP plots showing HEC subpopulations based on scRNA-seq. f Dot plots showing the average and percentage expression of representative endothelial and hematopoietic signature genes in the pre-HEC, L/M-HEC and E-HEC. g Major GO terms of genes with HEC subtype-specific expression and accessible promoter. h Heatmap showing the enrichment of cluster-specific TF-binding motifs. i Integrated scRNA-seq and scATAC-seq showing genes with cell type-specific expression and activity in L/M-HEC and E-HEC, respectively.

To further characterize the HEC/HSPC heterogeneity, we focused on EC and HEC/HSPC clusters. Subclustering analysis of EC and HEC/HSPC clusters identified venous EC (VEC, nr2f1a, flt4 and aplnra), arterial EC (AECs, ephrnB2, dll4 and cxcr4a), HEC (gfi1aa, gata2b and runx1) and HSPC (cmyb and ptprc) subclusters (Fig. 1b, c). Differentially expressed gene (DEG) analysis revealed the existence of lymphoid-biased HSPC (L-HSPC), which expresses ikaros and rac2, myeloid-biased HSPC (M-HSPC), which expresses myeloid genes spi1b, lcp1 (l-plastin) and ccr9a, and erythroid-biased HSPC (E-HSPC), which expresses erythroid genes gata1, drl and tal1 (Fig. 1c, d). Accordingly, subclustering of EC and HEC/HSPC clusters from our scATAC-seq data characterized the AEC (cdh5, flt1, notch1a and klf2a), VEC (kdrl and flt4), HEC (runx1 and cmyb), L/M-HSPC (spi1b, il1b, lcp1, lyz and rac2) and E-HSPC (gata1 and hbae5) (Fig. 1b, d), suggesting the existence of nascent HSPC heterogeneity in the AGM region. Next, we scrutinized the HEC subtypes by performing correlation analysis of accessible promoters and gene expression signatures at HEC specification stage (Fig. 1e). pre-HEC was featured by the high expression of vascular gene pecam1, and enriched Gene Ontology (GO) terms related to cell population proliferation and DNA metabolic process; lymphoid/myeloid-biased HEC (L/M-HEC) showed the lymphoid/myeloid gene expression (i.e., coro1a and snrpg) as well as enrichment of GO terms related to myeloid cell differentiation and regulation of neutrophil chemotaxis; erythroid-biased HEC (E-HEC) highly expressed genes associated with regulation of heme biosynthetic process and erythrocyte differentiation, evidenced by expression of hemoglobin biogenesis-related genes gata1 and hbbe2 (Fig. 1f, g; Supplementary information, Table S3). Trajectory analysis using VECTOR displayed a branched and continuous path from AEC to heterogeneous HSPCs passing through pre-HEC, L/M-HEC and E-HEC, respectively (Supplementary information, Fig. S2d), suggesting that pre-HEC represented a cell population initiating the EHT, while L/M-HEC and E-HEC had been primed towards the lineage-biased HSPCs. The developmental path inferred by scATAC-seq also showed that the HEC represented a transitioning cell state that developed from EC and further generated L/M-HSPC and E-HSPC (Supplementary information, Fig. S2e), accompanied with dynamic changes in the accessibility of EC-, HEC- and lineage-specific hematopoietic genes (Supplementary information, Fig. S2f). To further support the cell type identification and developmental mapping, we showed concordance between the two separate single-cell measurements (Supplementary information, Fig. S2g). Collectively, these multi-omic maps reveal that transcriptional lineage priming potentially originates from the HEC populations and the accessible cis-regulatory elements constitute the regulatory programs for EHT and subsequent lineage specification of nascent HSPCs.

To explore the potential regulators of the HEC heterogeneity, we focused on the top-ranked transcription factors (TFs) identified by scATAC-seq and found the enriched SPI motif (SPI1, SPIB and SPIC) in HEC (Fig. 1h). Among them, SPI1 motif showed higher accessibility both in HEC and L/M-HSPC (Fig. 1h). Based on analysis of synteny conservation, we found that spi1a, spi1b and spi2 in zebrafish were closely related to the SPI1 subfamily (Supplementary information, Fig. S3a). Correlation analysis of transcriptome and chromatin accessibility of HECs confirmed that spi2, the most highly expressed member of the SPI1 family, was one of the top-enriched TFs and showed a higher level of gene activity in L/M-HEC, but not in E-HEC (Fig. 1i; Supplementary information, Fig. S3b–d). The other two well-annotated spi genes, spi1a and spi1b, showed no specific expression in HEC (Fig. 1d; Supplementary information, Fig. S3e). Taken together, integrative scRNA-seq and scATAC-seq analyses characterize the HEC and HSPC heterogeneity and implicate that an ETS family TF spi2 is involved in lymphoid/myeloid-biased EHT programs.

Lineage tracing of spi2 + HECs and their contribution to embryonic lymphoid and myeloid lineages

To determine the potential role of Spi2 in regulation of heterogeneous EHT, we first examined its expression by whole-mount in situ hybridization (WISH), fluorescence in situ hybridization (FISH) and quantitative real-time PCR (qPCR). spi2 was specifically expressed in the AGM region from 24 hpf to 36 hpf, and in the caudal hematopoietic tissue (CHT) region at 3 days post fertilization (dpf), and co-localized with the endothelial gene fli1a (Supplementary information, Fig. S3f–h). qPCR result confirmed the enriched expression of spi2 in HECs (Supplementary information, Fig. S3i).

To perform the in vivo lineage tracing of the spi2+ HECs, we generated a transgenic reporter line Tg(spi2:Gal4), by inserting a P2A-Gal4 cassette in the 3’ end of the endogenous Spi2-coding sequence using the CRISPR/Cas9-mediated gene knock-in strategy (Fig. 2a).16 When crossing the Tg(spi2:Gal4) to Tg(UAS:GFP), we observed that GFP+ cells in Tg(spi2:Gal4/UAS:GFP) were found in the VDA, CHT, thymus and kidney at 28 hpf, 3 dpf and 5 dpf, respectively (Fig. 2b), which mimicked the gene expression pattern of spi2 as assessed by WISH (Supplementary information, Fig. S3f). By outcrossing different transgenic reporter lines, we further showed that spi2+ mCherry and GFP signals in Tg(spi2:Gal4/UAS:mCherry) and Tg(spi2:Gal4/UAS:GFP) could be readily detected in the runx1:enGFP+ and scla:dsRed+ HEC/HSPCs, respectively (Fig. 2c). Time-lapse imaging revealed the emergence of spi2:GFP+ cells from the VDA through the typical flat-to-round morphological change (Fig. 2d; Supplementary information, Video S1), confirming that Spi2+ HECs underwent EHT and eventually became HSPCs. Next, to determine whether Spi2+ HECs can give rise to hematopoietic derivatives, we generated a Tg(spi2:Gal4/UAS:Kaede), in which the spi2+ HECs are specifically marked by the photoactivatable fluorescent protein Kaede in the VDA region (Fig. 2e). The Kaede+ HECs in Tg(spi2:Gal4/UAS:Kaede) were converted with a 405-nm UV laser at 28 hpf. After photoconversion, the Kaede protein switches irreversibly from green- to red-emitting form and single photoconverted Kaede+ HEC can be easily followed via time-lapse imaging at different developmental stages (Fig. 2e). Confocal imaging showed that abundant red-fluorescent progenies appeared in the CHT and thymus in post-photoconverted Tg(spi2:Gal4/UAS:Kaede) embryos (Fig. 2f), indicating that the hematopoietic pool in the CHT and thymus were attributed to spi2+ HECs. Furthermore, to interrogate whether ablation of spi2+ cells could affect their hematopoietic derivatives, spi2+ cells in Tg(spi2:Gal4;UAS:nfsb-mCherry) were specifically ablated via metronidazole (MTZ)-induced cytotoxicity from 28 hpf to 48 hpf. Live imaging showed a nearly complete absence of spi2+ hematopoietic lineages in the CHT and a great reduction in the thymus in MTZ-treated embryos, compared with control embryos at 4 dpf (Fig. 2g–i). WISH confirmed that the expression of HSPC marker gene, cmyb, and hematopoietic derivative marker genes, lyz (myeloid) and rag1 (lymphoid), but not gata1 (erythroid), was decreased at 4 dpf in the spi2+ cell-ablated embryos (Fig. 2j–l), indicating that ablation of spi2+ cells could decrease the number of L/M-HSPCs but not E-HSPCs. Taken together, spi2 is specifically expressed in HEC/HSPCs, and spi2+ HECs are responsible for the establishment of embryonic hematopoietic lineages, especially the lymphoid and myeloid cells.

Fig. 2: Lineage tracing of spi2+ HECs.
figure 2

a Schematic illustration of intron targeting-mediated Gal4 knock-in strategy at the zebrafish spi2 locus by using the CRISPR/Cas9 system. b GFP+ cells of Tg(spi2:Gal4/UAS:GFP) embryos in the trunk, CHT, thymus and kidney at 28 hpf, 3 dpf and 5 dpf. Yellow dashed lines highlight the vascular system, CHT, thymus, and kidney. Scale bar, 50 μm. c Confocal imaging of the AGM region of the embryos obtained by outcrossing the Tg(spi2:Gal4;UAS:GFP) with Tg(runx1:enGFP) and Tg(sclα:dsRed) at 32 hpf and 36 hpf, respectively. White arrows and arrowheads denote HECs or emerging HSPCs. Scale bar, 50 μm. d Time-lapse imaging between 32 hpf and 52 hpf of Tg(kdrl:mCherry/spi2:Gal4;UAS:GFP). Numbers in white indicate recording time in hours, minutes, and seconds. White arrowheads denote one kdrl:mCherry+spi2:Gal4/UAS:GFP+ cell in the VDA. Scale bars, 50 μm. See also Supplementary information, Video S1. e Imaging of Kaede+ cells before (green) and after (red) photoconversion in the VDA at 28 hpf. Scale bars, 200 μm (bright field) and 20 μm (fluorescent field) (n = 10 embryos). f Imaging of VDA-derived converted (red and yellow) and unconverted (green) hematopoietic cells in the CHT and thymus at 4 dpf. Boxes depict the CHT and dashed circle depicts the thymus. Scale bar, 50 μm (n = 10 embryos). gi Imaging (g) and statistical data (h, i) of mCherry+ cells (red) in the CHT and thymus of Tg(spi2: Gal4/UAS:nfsb-mCherry) at 4 dpf in DMSO- and MTZ-treated embryos. Dashed boxes depict the CHT and dashed circles depict the thymus. Scale bar, 50 μm. The result is represented as means ± SD. Student’s t-test, ***P < 0.001, ****P < 0.0001. jl WISH (j) and quantification (k, l) showing the expression of cmyb, gata1, lyz and rag1 in the CHT and thymus of Tg(spi2: Gal4/UAS:nfsb-mCherry) at 4 dpf after MTZ treatment. Scale bar, 100 μm. The results are represented as means ± SD. Student’s t-test, ****P < 0.0001, n.s., no significance.

Molecular profiling of spi2 + HECs and their stress-responsive function

To further characterize the transcriptome signatures of spi2+ lineages, we isolated spi2+ hematopoietic lineages (kdrl+spi2+ and kdrlspi2+) in Tg(kdrl:mCherry/spi2:Gal4;UAS:GFP) at 36 hpf, and performed the modified SMART-Seq2, named single-cell tagged reverse transcription and sequencing (STRT-seq).17 DEG analysis identified three cell clusters, including early HEC (kdrl, cdh5 and dll4), late HEC (runx1 and gata2b) and HSPC (cmyb) (Fig. 3a, c; Supplementary information, Fig. S4a and Table S4). Pseudo-temporal analysis by Monocle2 and RNA velocity showed the development path from early HEC, late HEC to HSPC (Fig. 3b; Supplementary information, Fig. S4b), accompanied with the downregulated expression of endothelial/HEC genes (cdh5, dll4, runx1 and gata2b) and sustained expression of HSPC gene (cmyb) (Fig. 3c). Meanwhile, significant enrichment of lymphoid/myeloid signatures (spi1b, coro1a, cebpa, ikaros and lyz), but not erythroid signatures (tal1, hemgn, alad and gata1), was observed in spi2+ HEC and HSPC (Fig. 3d), suggesting that spi2+ HECs have lymphoid and myeloid potential. Moreover, GO analysis showed that terms related to inflammatory signaling and cell cycle regulation were highly enriched in early HEC, while terms related to chromatin reorganization and definitive hematopoiesis were highly enriched in late HEC (Fig. 3e; Supplementary information, Fig. S4c and Table S5), implying that inflammatory signaling might be in favor of lymphoid/myeloid hematopoietic fate acquisition in HECs. Overall, these results reveal that the lymphoid and myeloid lineage potential is endowed as early as in spi2+ HECs.

Fig. 3: STRT-seq of spi2+ lineages and the stress-induced response of spi2+ HEC by LPS treatments.
figure 3

a UMAP plots showing spi2+ lineages from Tg(kdrl:mCherry/spi2:Gal4;UAS:GFP) embryos at 36 hpf. b Monocle2 analysis showing the pseudo-temporal developmental trajectory of spi2+ clusters. c Cell type-specific signature gene expression in the clusters ordered along the pseudo-temporal developmental trajectory. d Hematopoietic lineage-specific signature gene expression in the early HEC, late HEC and HSPC. e GO enrichment analysis revealing the functional annotation of gene modules highly correlated with early and late HECs, respectively. f qPCR showing the expression of pro-inflammatory cytokines, il1b, il6 and il8 in PBS- and LPS-treated embryos at 36 hpf. The results are represented as means ± SD. Student’s t-test, *P < 0.05, **P < 0.01. g Confocal imaging and statistical data showing the number of spi2+ HECs and HSPCs in the AGM (arrowheads) in PBS- and LPS-treated embryos at 36 hpf. Scale bar, 50 μm. The results are represented as means ± SD. Student’s t-test, *P < 0.05. h qPCR showing the expression of HEC/HSPC markers runx1 and cmyb in PBS- and LPS-treated embryos at 36 hpf. The results are represented as means ± SD. Student’s t-test, **P < 0.01. ik WISH (i) and quantification (j, k) showing the expression of cmyb, gata1, l-plastin and mfap4 and ikaros in the CHT and thymus in PBS- and LPS-treated embryos at 4 dpf. Scale bar, 100 μm. The results are represented as means ± SD. Student’s t-test, ****P < 0.0001, n.s., no significance.

As mentioned above, we speculated that inflammatory stress could induce or enhance the L/M-HEC and L/M-HSPC generation. To test the functional role of the spi2+ HEC, we established a bacterial infection model to test the response of Spi2+ HECs to acute inflammation, by administration of a sub-lethal dose of bacterial lipopolysaccharide (LPS) at 28 hpf. Acute bacterial infection with LPS increased expression of pro-inflammatory cytokine genes il1b, il6 and il8 at 36 hpf as indicated by qPCR (Fig. 3f), reflecting the enhanced inflammation following LPS stimulation. Furthermore, live imaging of Tg(kdrl:mCherry/spi2:Gal4;UAS:GFP) showed that the number of spi2+ HECs was elevated in the LPS-treated embryos (Fig. 3g). Moreover, LPS treatment increased expression of runx1 and cmyb at 36 hpf as shown by qPCR (Fig. 3h). WISH data showed the expression of cmyb, neutrophil-specific marker l-plastin, and macrophage-specific marker mfap4 in the CHT; and the expression of lymphocyte progenitor marker, ikaros, was evidently increased in the thymus at 3.5 dpf (Fig. 3i–k), while the expression of gata1 in the CHT remained unaltered in LPS-treated embryos (Fig. 3i, j). Therefore, these data demonstrate that spi2+ L/M-HECs are involved in response to acute inflammation, suggesting their potential role in emergency hematopoiesis during embryogenesis.

spi2 deficiency leads to impaired lymphoid/myeloid-biased HEC specification

To further determine the function of spi2 during HSPC generation, we generated a spi2-null mutant (spi2–/–) with a 5-bp insertion in the third exon (Supplementary information, Fig. S5a–c). The efficacy of spi2 knockout (KO) was further validated by western blot analysis (Supplementary information, Fig. S5d). Confocal imaging analysis showed that the number of HECs (kdrl+runx1+) in the VDA at 36 hpf and the number of HSPCs (CD41+gata1) in the CHT at 3 dpf were significantly reduced in spi2–/– embryos (Fig. 4a, b). Concomitantly, WISH and qPCR showed that the expression of HEC/HSPC genes (runx1 and cmyb), myeloid and lymphoid genes but not the erythroid gene was also downregulated significantly in spi2–/– embryos at 36 hpf and 4 dpf, respectively (Fig. 4c–h). Consistently, the hematopoietic phenotypes of spi2-knockdown (KD) embryos (spi2 morpholino (MO)-injected embryos) largely phenocopied those in spi2–/– mutants (Supplementary information, Fig. S5e–i). Moreover, overexpression of spi2 increased the number of HECs (kdrl+runx1+) and HSPCs (kdrlrunx1+) in the VDA (Supplementary information, Fig. S6a–c). Furthermore, the fli1a promoter-driven EC-specific overexpression of spi2 efficiently enhanced the number of HSPCs and rescued the HSPC defects in spi2–/– mutants at 36 hpf (Supplementary information, Fig. S6d–h), demonstrating that spi2 regulates HEC/HSPC generation in an EC-autonomous manner. Together, these results showed that EC-specific spi2 is required for HSPC generation.

Fig. 4: spi2 is required for the specification of L/M-HSPCs.
figure 4

a Confocal imaging and statistical data of Tg(kdrl:mCherry/runx1:enGFP) showing the numbers of HECs and emerging HSPCs in the AGM (arrowheads) in sibling and spi2–/– embryos. Scale bar, 50 μm. The results are represented as means ± SD. Student’s t-test, ****P < 0.0001. b Confocal imaging and statistical data of Tg(CD41:GFP/gata1:dsRed) showing the number of CD41+gata1 HSPCs in the CHT region of sibling and spi2–/– embryos at 3 dpf. Scale bar, 50 μm. The dashed boxes indicate the region of HSPC counting. The results are represented as means ± SD. Student’s t-test, *P < 0.05. c, d WISH showing the expression of HEC/HSPC markers runx1 and cmyb in sibling and spi2–/– embryos at 36 hpf. Scale bar, 50 μm. The results are represented as means ± SD. Student’s t-test, ****P < 0.0001. eg WISH (e) and quantification (f, g) showing the expression of hematopoietic lineage markers gata1, pu.1 and rag1 in the CHT and thymus in sibling and spi2–/– embryos at 4 dpf. Scale bar, 50 μm. The results are represented as means ± SD. Student’s t-test, ****P < 0.0001, n.s., no significance. h qPCR showing the expression of cmyb, ikaros and lyz, pu.1 and gata1 in sibling and spi2–/– embryos at 4 dpf in the CHT. The results are represented as means ± SD. Student’s t-test, *P < 0.05, **P < 0.01, ***P < 0.001, n.s., no significance.

To further investigate the earliest role of spi2 in regulating lineage competence during EHT, we sorted ECs (kdrl+runx1), HECs (kdrl+runx1+) and hematopoietic cells (kdrlrunx1+) from control and spi2 morphants at 36 hpf, and performed scRNA-seq analysis (Supplementary information, Fig. S7a, b and Table S6). The Uniform Manifold Approximation and Projection (UMAP) analysis showed the overall similar cell types in both datasets of control and spi2 morphants, however, the HEC/HSPC cluster (C2) was largely absent in spi2 morphants (Fig. 5a). To further assess whether the EHT subclusters were affected by spi2 loss, we quantified the relative numbers of AEC, HEC subtypes (pre-HEC, L/M-HEC and E-HEC), and HSPC subtypes (L-HSPC, M-HSPC and E-HSPC) in spi2 morphants and control embryos. In spi2 morphants, a decrease in the numbers of L/M-HECs, L-HSPCs and M-HSPCs, but not pre-HECs, E-HECs and E-HSPCs, was observed (Fig. 5b). This result indicated that spi2 deficiency led to the impaired generation of L/M-primed HECs and HSPCs but not E-primed HECs and HSPCs, suggesting that Spi2 is only required to promote the formation of L/M-HSPCs but not the E-HSPCs, and thus these two heterogenous HSPC populations are likely regulated separately during HSPC formation. Notably, the number of AECs in spi2 morphants was much higher than that in control (Fig. 5b), implying the accumulation of AECs upon spi2 depletion. To clarify this, time-lapse imaging of EHT in Tg(kdrl:mCherry/cmyb:eGFP) showed that endothelial cells (kdrl+cmyb) in spi2–/– embryos maintained their endothelial identity and failed to convert into HSPCs (kdrlcmyb+), in contrast to those in wild-type siblings (Supplementary information, Videos S2 and S3). In addition, the proliferation and apoptosis status of ECs in the VDA appeared to be unaffected by spi2 loss at 36 hpf (Supplementary information, Fig. S7c, d), suggesting that the observed HSPC phenotype was not attributable to the abnormal survival and proliferation of ECs. Taken together, these single cell transcriptomes and functional assays imply that spi2 regulates the balance of AEC fate versus L/M-HEC/HSPC fate choice during HSPC generation.

Fig. 5: Transcriptional activation of genes related to artery development and suppression of genes related to immune homeostasis in spi2-KD HECs.
figure 5

a UMAP plots of control and spi2 morphants. White arrows indicate the HEC/HSPC cluster. b Percentage of cells in each EHT scRNA-seq cluster. The results are represented as means ± SD. Student’s t-test, ***P < 0.001, ****P < 0.0001, n.s., no significance. c Volcano plot showing DEGs between HECs from single-cell transcriptomic dataset in control and spi2 morphants. d GO analysis showing the enrichment of up- and downregulated genes in spi2-KD HECs. e Enrichment plots for the top pathways in the morphant HECs by GSEA.

Spi2-mediated genome-wide transcriptional regulation of arterial- versus lymphoid/myeloid-biased hematopoietic programs

To identify the potential molecular mechanisms, we analyzed single-cell transcriptomes of HECs in control and spi2-KD embryos and found that 232 genes were upregulated and 850 genes were downregulated in spi2-KD HECs, compared to the control HECs (Fig. 5c; Supplementary information, Table S7). Notably, GO and DEG analyses revealed that the expression of genes related to hematopoiesis and regulation of neutrophil chemotaxis was decreased, whereas that of genes related to dorsal aorta development was increased in spi2-KD HECs (Fig. 5c, d). WISH and qPCR results validated that the expression of AEC markers was markedly increased in spi2-KO embryos (Supplementary information, Fig. S7e, f). Gene set enrichment analysis (GSEA) suggested the attenuated inflammatory response in spi2-KD HECs, including activation of immune response and cytokine production involved in immune response (Fig. 5e).

To determine whether Spi2 can directly modulate transcriptional programs in EC/HEC, we examined genome-wide Spi2 binding by CUT&Tag analysis in ECs (fli1a+) sorted from Tg(fli1a-flag-spi2-EGFP) at 36 hpf (Fig. 6a, b). The binding motif (GCTGGAAGGGCA) of Spi2 was identified as the most enriched motif by HOMER (Fig. 6c).18 Combined with DEG analysis of single-cell transcriptomes of HECs in control and spi2 morphants, a total of 112 upregulated and 430 downregulated genes were found to be bound by Spi2 (Fig. 6d; Supplementary information, Table S8). Genes involved in artery development (kdrl and mcamb), endothelial fate determination (etv2 and sox7), Notch signaling (notch1b and notch3) and BMP signaling (bmp6), were found in the overlap between Spi2-bound genes and upregulated genes in spi2-KD HECs (Fig. 6d, e). By contrast, genes involved in HSC fate determination (cbfβ) and myeloid fate determination (gfi1aa, cebpβ, l-plastin, csf1rb and ikaros) were found in the overlap between Spi2-bound genes and downregulated genes in spi2-KD HECs (Fig. 6d, e). Genome browser tracks also suggested that Spi2 can bind to the promoters of known vessel and hematopoietic regulators in HECs (Fig. 6f). Thus, CUT&Tag analysis, together with the single-cell transcriptome profiling, indicates that spi2 is capable of repressing the EC program and activating proper expression of hematopoietic lineage regulators to facilitate L/M-HEC specification.

Fig. 6: Spi2 regulates lymphoid/myeloid HSPC fates by directly activating the transcription of master hematopoietic TFs and repressing that of the master endothelial TFs in HECs.
figure 6

a Imaging of enhanced green fluorescent protein (EGFP) fluorescence in Tg(fli1a-flag-spi2-EGFP) embryos at 36 hpf. Scale bar, 100 μm. b Protein level of Flag in EGFP+ and EGFP embryos at 36 hpf. c The binding motif of Spi2. d Venn diagram showing the overlap between Spi2-bound and Spi2 up- or downregulated genes. e GO analysis of Spi2-bound target genes identified by CUT&Tag analysis which were up- or downregulated in HECs of spi2 morphants as indicated by scRNA-seq at 36 hpf. f TF binding for Spi2 at kdrl, gfi1aa and ikaros genomic loci. The putative promoters are shaded. g Immunofluorescence imaging showing Spi2 staining (red), which displays co-localization (yellow) with gfi1:GFP+ cells in the Tg(gfi1:GFP) embryos at 36 hpf. Scale bar, 50 μm. h, i WISH (h) and quantification (i) showing the expression of gfi1aa in spi2+/+, spi2–/– and spi2 morphants. Scale bar, 50 μm. The results are represented as means ± SD. Student’s t-test, ****P < 0.0001. j qPCR showing the expression of gfi1aa in spi2+/+, spi2–/– and spi2 morphants. The results are represented as means ± SD. Student’s t-test, **P < 0.01. k, l WISH (k) and quantification (l) showing that gfi1aa mRNA rescues the expression of cmyb in the AGM at 36 hpf and CHT at 4 dpf of spi2–/– embryos. Scale bar, 50 μm. The results are represented as means ± SD. Student’s t-test, *P < 0.05, **P < 0.01, ****P < 0.0001.

Next, we sought to define the direct downstream targets of Spi2. Among more than 300 downregulated targets of Spi2 in spi2-KD HECs, we noticed the TF gene gfi1aa. Gfi1, the mammalian homolog gene of zebrafish gfi1aa, has been reported to be involved in the inhibition of endothelial program, the full acquisition of hematopoietic properties and lymphoid/myeloid lineage development in mice, making it a target gene candidate of Spi2 with the highest confidence.19,20 To test whether gfi1aa is indeed a direct target of Spi2, bioinformatics analysis of CUT&Tag datasets showed that Spi2 could bind to the promoter of gfi1aa in ECs (Fig. 6d, f). In addition, the co-expression of Spi2 and Gfi1 was identified by immunofluorescence analysis in vivo (Fig. 6g). To further interrogate whether the hematopoietic defects caused by spi2 depletion was due to dysregulation of gfi1aa, we performed WISH and qPCR analyses and found that loss of spi2 decreased the expression of gfi1aa in the AGM region (Fig. 6h–j). Importantly, overexpression of gfi1aa efficiently rescued the HSPC defects in spi2–/– embryos (Fig. 6k, l). Taken together, our results suggest that gfi1aa serves as a direct target of Spi2 to control HSPC fate choice in HECs.

SPI1 inhibition impairs human lymphoid/myeloid-biased HSPC generation

To better understand the evolutionary conservation of L/M-HEC/HSPCs in the human and zebrafish AGM, we compared our STRT-seq dataset of spi2+ HEC/HSPCs with the recently published scRNA-seq dataset of human EHT.21 UMAP visualization of integrated data indicated that spi2+ early HEC, late HEC and HSPC in zebrafish merged well and separated stage-wise with HEC (HE) and HSC (HSC1 and HSC2) clusters in the human AGM (Fig. 7a). Interestingly, we found that the expression of SPI1 was highly enriched in HECs in the human dorsal aorta (Fig. 7b), which is similar to spi2 expression in zebrafish (Supplementary information, Fig. S3b–i). To investigate the HEC heterogeneity in human AGM, we classified the HEC in UMAP space based on threshold detection of SPI1 transcript in single cells in HEC cluster and obtained SPI1-positive HEC (SPI1-pos HEC) and SPI1-negative HEC (SPI1-neg HEC) subclusters (Fig. 7c). Furthermore, we compared the transcriptomic features of SPI1-pos and SPI1-neg HEC. GO analysis identified enrichment of genes in SPI1-pos HEC associated with leukocyte proliferation, T cell activation and regulation of immune effector process (Fig. 7d; Supplementary information, Table S9), in agreement with the presence of spi2+ L/M-HEC and their immune-responsive function in zebrafish (Fig. 3e–k). In contrast, SPI1-neg HEC was enriched with genes associated with endothelium development and response to lead ion. Collectively, these results suggest that L/M-HECs are evolutionarily conserved across species.

Fig. 7: SPI1 inhibition impairs human L/M-HSPC generation.
figure 7

a Unsupervised graph-based clustering of single cells from human AGM and zebrafish spi2+ L/M-HEC/HSPCs. b UMAP showing the conserved signature genes of each cluster in human and zebrafish. c Classification of human HEC cluster based on SPI1 transcription. d GO terms enriched in SPI1-positive and -negative HECs, respectively. e Schematic diagram showing hematopoietic differentiation cultures of human iPSCs. SPI1 or control siRNA was added on day 5 of EHT. f qPCR showing the expression of DLL4, RUNX1 and SPI1 (relative to GAPDH) in sorted HECs and HSPCs at day 12. The results are presented as means ± SD. Student’s t-test, *P < 0.05, **P < 0.01, ***P < 0.001. g qPCR showing the expression of SPI1 (relative to GAPDH) on day 12 in SPI1 siRNA-treated HECs relative to control siRNA-treated HECs. The results are presented as means ± SD. Student’s t-test, ***P < 0.001. h FACS plots showing that SPI1 KD decreases CD34+CD45+ cells on day 12. i CFU-cell assays showing the colony-forming ability of HSPCs following SPI1 siRNA treatment. The results are presented as means ± SD. Student’s t-test, *P < 0.05, **P < 0.01, n.s., no significance.

To investigate the function of SPI1 in human hematopoiesis, we applied human induced pluripotent stem cells (iPSCs)-based hematopoietic differentiation system as reported previously (Fig. 7e).22 qPCR analysis of the expression of SPI1 in sorted ECs (CD31+CD34CD45), HECs (CD31+CD34+CD45) and HSPCs (CD31CD34+CD45+) on day 12 revealed that SPI1 was highly expressed in HECs and HSPCs, compared to ECs (Fig. 7f). SPI1 KD by small interfering RNA (siRNA) decreased the number of HSPCs (Fig. 7g, h). To examine the function of induced HSPCs in control and SPI1-KD groups, we performed colony-forming unit (CFU) assay and observed that SPI1 KD could significantly reduce the formation of granulocyte-macrophage colonies, but not erythrocyte colonies from HSPCs, suggestive of impaired L/M-HSPC generation (Fig. 7i). Importantly, SPI1 KD significantly reduced the expression of hemogenic genes (RUNX1, CMYB, HLF and GFI1) and myeloid lineage genes (CEBPB, LCP1 and CSFIR) in HECs (Supplementary information, Fig. S7g, h), indicating the specific regulatory role of SPI1 in human induced HECs. Collectively, these results demonstrate that spi2 in zebrafish and SPI1 in humans might share the evolutionary conserved role in definitive hematopoiesis across vertebrates.

Discussion

The HSPC pool is widely considered to be heterogeneous in terms of self-renewal and differentiation properties. Identification of the origin and function of HSPC heterogeneity is important for a deep understanding of the occurrence of fetal stress response and for the generation of transplantable heterogenous HSPCs in vitro. With the advance of multicolor labeling transgenic methods, single-cell transcriptomics and genetic barcoding technologies, it is feasible to tackle HSPC heterogeneity in vivo.23,24,25 For example, in zebrafish, fate mapping assays using zebrabow labeling system demonstrate the heterogeneous lineage outputs of embryonic HSCs that are derived from aortic endothelium.26 In mice, unbiased labeling approach based on the Confetti colors reveal that newly specified blood progenitors in the aorta are polyclonal in origin.5,6 More recently, in situ cellular barcoding and fate mapping experiments characterize that the embryonic-born multipotent progenitors, independent of traditional HSCs, can also contribute to the young adult blood system in mice.8 These studies raise the possibility that the heterogeneity of HSCs may have been pre-defined at earlier stages.7,11,27,28,29 To determine the origin of heterogeneous embryonic-born blood stem and progenitor cells, in this study, we further characterize HEC heterogeneity by integrated single-cell multi-omics and in vivo lineage tracing strategies. Together with the newly generated Spi2 reporter line, our single-cell transcriptome analysis, lineage tracing and functional assays demonstrate that the origin of L/M-HSPCs occurs at the HEC stage. Our results in zebrafish embryos and human iPSCs also demonstrate the evolutionary conservation of HSPC heterogeneity through EHT across vertebrates. Whether and how we can define the origins of other types of distinct lineage-biased HSPCs and their corresponding regulators needs to be further investigated.

Recently, single-cell transcriptome analysis and temporal lineage tracing experiments showed that draculin (drl)-labeled HSC-independent progenitors (Prog+) and HSCs largely exhibit stage-specific contribution to embryonic and adult hematopoiesis, and the contribution of drl+ HSCs to lymphoid/myeloid populations likely starts around day 8 in zebrafish.30 According to our zebrafish scRNA-seq data, we found that drl is highly enriched in E-HSPC, but relatively lower in L-HSPC and M-HSPC, suggesting their limited lymphoid and myeloid contribution. To uncover the exact contribution of HSCs and Prog+ to the HSC pool, identifying more specific markers that can label HSCs and/or Prog+ is required. In our work, we found that spi2 can specifically label HEC/HSPC, especially L/M-HEC/HSPC, but not other early blood progenitors and niche cells during definitive hematopoiesis. Functionally, ablation of spi2+ cells by MTZ-induced cytotoxicity from 1 dpf to 2 dpf or spi2 loss-of-function can decrease the number of embryonic differentiated lymphoid and myeloid lineages in the CHT and thymus at 4 dpf, but not the Prog+ at 30 hpf.31 Therefore, we speculate that a possible explanation for the discrepancies between the recent publication30 and our work may be attributed to the different markers used, which likely represent distinct HSPC subpopulations during zebrafish developmental hematopoiesis. Together, our findings reveal the temporal and spatial-specific expression of spi2 in L/M-HEC/HSPCs and that spi2+ HEC-derived HSPCs are responsible for the establishment of embryonic hematopoietic lineages, especially the lymphoid and myeloid cells.

During adult hematopoiesis, in response to infection, blood stem and progenitor compartments generate more myeloid outputs via emergency myelopoiesis.32 However, whether emergency-like hematopoiesis also exists in embryogenesis has not been reported yet. In our study, Spi2+ HECs are characterized by enriched inflammatory stress and immune response-related transcriptome and chromatin features. Moreover, these cells can generate HSPCs with lymphoid/myeloid-biased differentiation potential. Given that inflammatory signals initiated by local immune cells or cell-intrinsic transposon elements, R-loop, and metabolic cues are essential for HSPC formation in zebrafish and mouse embryos,16,25,33,34,35,36,37,38,39 we speculate that HEC heterogeneity could be considered as an efficient stress-responsive mechanism to meet a variety of body demands during embryogenesis under both normal and stress conditions.

Spi1 is an Ets-family TF and is expressed in most hematopoietic lineages, including the HSCs and differentiated cell types.40,41 It was originally identified as an important factor that regulates or coordinates with other TFs, such as c-JUN, GATA1, GATA2, C/EBP and RUNX1, during terminal differentiation and lineage commitment of myeloid cells, as well as development of B and T cells.42,43,44,45,46,47 In addition, it is also one of the cocktail TFs for the generation of functional human HSCs from human iPSCs and the generation of engraftable mouse HSCs from adult endothelial cells in vitro.48,49,50 However, the role of SPI1 in regulating definitive hematopoiesis across vertebrates has not yet been identified. Notably, a previous study has shown that Spi1 is also expressed in definitive hemogenic endothelium in mouse embryonic day 11.5 embryos,51 indicating its potential role in HSPC development in mammals. The zebrafish genome comprises three homologous spi1 genes including spi1a, spi1b and spi2. spi1a and spi1b have no specific gene expression in HECs, suggesting that another spi gene may play a similar role as mammalian Spi1 does. Interestingly, a recent zebrafish study reported that genetic ablation of spi2 results in compromised proliferation of HSPCs in the CHT during early development and severe myeloid lineage defect in adulthood.31 However, whether and how zebrafish spi2 or mammalian SPI1 regulates HSPC specification and heterogeneity across vertebrates remain elusive. In the present work, we delineate the function of spi2 (the SPI1 homolog gene in zebrafish) and its regulatory mechanism in L/M-HSPC generation. We propose that the ETS TF, Spi2, acts as an earliest lymphoid/myeloid fate determinant in HECs by suppressing the endothelial program and activating proper expression of the master hematopoietic lineage regulators (such as Gfi1aa). In addition, we show that the expression, function and molecular regulation of SPI1 are conserved between zebrafish embryos and human HSPCs in vitro; however, whether these finding can be recapitulated in human embryos in vivo remains unaddressed.

Taken together, these findings support that HSPC heterogeneity is originated from the HEC stage and that Spi2-mediated genome-wide transcriptional regulation is responsible for the development of embryonic-born L/M-HSPCs, which will provide new insights into the generation of the first lineage-primed HSPCs in vivo or in vitro.

Materials and methods

Zebrafish

Zebrafish strains including AB, transgenic line (Tg)(kdrl:mCherry),52 Tg(runx1:enGFP),12 Tg(CD41:GFP),53 Tg(gata1:dsRed),54 Tg(cmyb:eGFP),55 Tg(fli1a:EGFP),56 Tg(gfi1:GFP),57 Tg(sclα:dsRed),58 Tg(UAS:GFP),59 Tg(UAS:Kaede),60 and Tg(UAS:nfsb-mCherry)61 were raised at 28.5 °C system water and staged as previously described.62 The embryos from crosses between adult males and females of zebrafish were used for all experiments. The experiments were approved by the Ethical Review Committee in the Institute of Zoology, Chinese Academy of Sciences, China.

Generation of Tg(spi2:Gal4) and spi2 mutants using CRISPR/Cas9

For the generation of the Tg(spi2:Gal4) line, the improved Gal4 variant Gal4FF was inserted into the spi2 genomic locus on chromosome 4 using CRISPR/Cas9 system according to published protocols.60 Guide RNA (gRNA) targeting the last intron of the zebrafish spi2 was designed in http://zifit.partners.org/ZiFiT/Disclaimer.aspx. The sequence of gRNA is 5’-CTGGGTTGGCAGTTATGGGC-3’. gRNA was synthesized as previously described.63 Cas9 mRNA was synthesized from pXT7 vector containing humanized Cas9 sequence using mMESSAGE mMACHINETM mRNA transcription synthesis kit (Invitrogen; AM1344), and cleaned up using RNAclean Kit (TIANGEN; DP412). Next, a donor plasmid spi2-P2A-Gal4 was designed consisting of three parts: a left arm, a P2A-Gal4-coding sequence, and a right arm (Fig. 2a). The 50 pg donor plasmid, 200 pg gRNA and 250 pg hCas9 mRNA were co-injected into one-cell-stage fertilized zebrafish eggs. The identification of F0 and F1 transgenic lines was performed by Polymerase Chain Reaction (PCR) and DNA fragment sequencing. The primers for genotyping were provided in Supplementary information, Table S10. Tg(spi2:Gal4) was outcrossed with the Tg(UAS:GFP),59 Tg(UAS:Kaede),60 and Tg(UAS:nfsb-mCherry),61 respectively, to obtain the Tg(spi2:Gal4/UAS:GFP), Tg(spi2:Gal4/UAS:Kaede) and Tg(spi2:Gal4/UAS:nfsb-mCherry) lines.

For generation of spi2–/–, the gRNA targeting exon 3 of spi2 was designed in http://zifit.partners.org/ZiFiT/Disclaimer.aspx. The gRNA sequence is 5’-GGAGCATGTGGCGGAGCTGT-3’. 250 pg Cas9 mRNA and 200 pg spi2 gRNA were co-injected into AB zebrafish embryos at one-cell stage. The identification of F0, F1 and F2 mutants was performed using PCR and Sanger sequencing. The DNA fragment of spi2–/– with 5-bp insertion (ATGTG) was digested by the restriction enzyme recognition site (MslI), but not that of the spi2+/+ sequence. The primers for genotyping were provided in Supplementary information, Table S10.

WISH and FISH

The RNA probe targeting spi2 was first labeled by digoxigenin (DIG) with DIG RNA labeling mix (Roche, 11093274910) or Fluorescein (FLU) with FLU RNA labeling mix (Roche, 11426346910), and then transcribed by T7 polymerase. Embryos were fixed in 4% Paraformaldehyde (PFA) at different stages and dehydrated with methanol at –20 °C. WISH was performed as previously described with probes for genes including spi2, runx1, cmyb, gata1, rag1, pu.1, l-plastin, mfap4, gfi1aa, and ikaros.64 To confirm the specific expression pattern of spi2, a cross-section of 24 hpf embryos in the trunk region after WISH staining was performed. The embryos were treated with 30% sucrose and washed three times with PBST. After being embedded in OCT complex (SAKURA, 4583), the embryos were sectioned using LEICA CM1900 Cryostats. FISH was performed as previously described to detect expression of spi2.65 The embryos were hybridized with FLU-labeled spi2 antisense RNA probes, incubated with anti-FLU-POD antibody (1:100, Roche, 11426346910), and stained with TSA-Cy3 amplification reagent (1:100, PerkinElmer, NEL744001KT). Nikon SMZ 1500 microscope was used to collect figures. The fluorescence images were acquired by Nikon confocal A1 laser microscope.

Western blot and Immunofluorescence analyses

Protein samples were extracted from mouse lentivirus-infected cells, zebrafish spi2+/+, spi2+/–, spi2–/–, spi2 morphant, fli1a-flag-spi2-EGFP-injected embryos, and control embryos at 36 hpf as previously described66 and separated on SDS-PAGE gels, followed by transferred to PVDF membranes. Membranes were blocked with 5% skimmed milk in Tris-buffered saline plus 0.1% Tween-20 (TBST) and incubated with primary antibodies at 4 °C overnight. Membranes were washed with TBST and incubated with secondary antibodies for 2 h at room temperature. Finally, membranes were washed with TBST and the signals were examined with a chemiluminescent HRP substrate (Millipore). The following antibodies were used, including rabbit anti-Flag antibody (1:2000, Sigma, F7425), rabbit anti-β-actin antibody (1:1000–1:2000, Cell Signaling Technology, 4967) and anti-rabbit HRP-linked antibody (1:1000–1:2000, 7074, Cell Signaling Technology). To make the Spi2 antibody, peptide of the 1–117 amino acids at N-terminal region of zebrafish Spi2 protein was synthesized to immunize rabbits. Polyclonal antibodies were purified from serum after 3-month immunization (ABclonal Biotechnology Co., Ltd) and were used for western blot analysis (1:1000).

Whole-mount immunofluorescence was performed as described previously.67 The primary antibody for immunofluorescence was anti-Spi2 (1:500, Abclonal), and secondary antibody was anti-rabbit IgG fluorescein (1:500, Invitrogen, A11037).

Photoconversion, confocal microscopy and time-lapse imaging

Kaede+ cells in AGM from Tg(spi2:Gal4/UAS:Kaede) were converted by 405-nm UV laser at 28 hpf as previously described.10 The photoconverted cells in AGM at 28 hpf and their progenies in the caudal hematopoietic tissue and thymus were observed at 4 dpf. For time-lapse imaging, zebrafish embryos of Tg(kdrl:mCherry/spi2:Gal4;UAS:GFP) and Tg(kdrl:mCherry/cmyb:eGFP) were embedded in agarose (1% in E3 medium) containing tricaine anesthetic at a temperature of 30 °C. Z-stacks were taken every 15 min. Confocal images were acquired by Nikon confocal A1 laser microscope and Andor Dragonfly confocal microscopy system. Movies and images were generated following processing with ImageJ software and Photoshop CS6.

Chemical treatment

Cell ablation in zebrafish embryos was carried out as previously described.68,69 Embryos of Tg(spi2:Gal4/UAS:nfsb-mCherry) at the 28 hpf were incubated in DMSO and 10 mM MTZ (Sigma, M3761, in DMSO), and daily refreshed from 28 hpf to 48 hpf. The cell ablation efficiency and hematopoietic phenotype were analyzed at 36 hpf and 4 dpf by confocal imaging and WISH. Wild-type and Tg(kdrl:mCherry/spi2:Gal4;UAS:GFP) embryos were incubated at the concentration of 0.5 mg/mL LPS (Sigma-Aldrich, L2880) from 28 hpf to 36 hpf. Then, the hematopoietic phenotypes of control and LPS-treated embryos were analyzed at 36 hpf and 4 dpf by confocal imaging and WISH.

Morpholinos, mRNA synthesis, vector construction and microinjection

The antisense splice morpholino of spi2 was designed (Supplementary information, Table S10) and purchased from GeneTools. The KD efficiency of spi2 splice morpholino was validated by PCR and the primers were listed in Supplementary information, Table S10. spi2 mRNA was synthesized using mMESSAGE mMACHINETM mRNA transcription synthesis kit (Invitrogen, AM1344), and cleaned up using using RNAclean Kit (TIANGEN, DP412) (Supplementary information, Table S10). 3 ng of morpholino and 100 ng of mRNAs were injected into the one cell-stage embryos, respectively. For the EC-specific overexpression experiments, the Flag-tagged full-length cDNA of spi2 was cloned and assembled into the pDestTol2pA2 vector with a fli1a promotor, and an EGFP reporter by Gateway systems to generate pDestTol2pA2-fli1a-flag-spi2-v2a-EGFP plasmid.70 The plasmid (50 pg) and Tol2 mRNA (25 pg) were co-injected into one cell-stage zebrafish embryos at the yolk/blastomere boundary.

Human iPSCs hematopoietic differentiation and embryoid body (EB) dissociation

iPSCs (BC1 cell line) were used as the starting cell for hematopoietic cell induction as previously reported.22 Briefly, iPSCs were cultured on Matrigel-coated 35 mm dish in mTeSR1 before induction and were digested into single cells by Accutase when preparing for induction. Then iPSCs were inoculated into 96-well round bottom plates at an initial density of 3000–4000 cells per well with 50 μL day 0 SFM medium (10 μM Y27632, 10 ng/mL BMP4, 10 ng/mL FGF2) and cultured at 37 °C, 20% O2 and 5% CO2 conditions. The hematopoietic differentiation media were added on day 2 (10 ng/mL BMP4, 10 ng/mL FGF2, 20 ng/mL vEGF and 100 ng/mL SCF), day 5 (10 ng/mL BMP4, 10 ng/mL FGF2, 10 ng/mL vEGF and 50 ng/mL SCF), day 8 (10 ng/mL BMP4, 10 ng/mL FGF2, 10 ng/mL vEGF and 50 ng/mL SCF) and day 11 (10 ng/mL BMP4, 10 ng/mL FGF2, 10 ng/mL vEGF, 50 ng/mL SCF and 20 ng/mL TPO), respectively.

During the induction, EBs were used for FACS analysis. They were dissociated to single cell suspension and resuspended in 1 mL DMEM containing 5% FBS, then incubated with multi-color antibody combinations at 4 °C for 20–30 min. The cells were sorted by Moflo XDP (Beckman).

siRNA interference and CFU assay

For SPI1 siRNA KD assay, the siRNAs for SPI1 (NM_003120.3) were designed and synthesized by GenePharma (Supplementary information, Table S10). 4 pmol of the SPI1 siRNA and control siRNA were transfected individually into wells by using Lipofectamine RNAiMAX (Invitrogen, 13778030). For CFU assay, 1000 CD34+CD45+ hematopoietic cells sorted by flow cytometry were added into 24-well ultra-low attachment plates (Stem Cell Technologies) coated with 1 mL Methocult H4434 (Stem Cell Technologies, H4434), rotated gently for several times and cultured in 37 °C, 20% O2 and 5% CO2 condition. The CFUs were classified and counted according to the morphology after 14 days.

qPCR

Cells from the trunk region of Tg(kdrl:mCherry/runx1:enGFP) at 36 hpf were dissected and dissociated for generating single-cell suspensions as previously described.71 ECs, HECs and HSPCs in control and SPI1-KD group were dissociated from EBs at day 12 during human iPSC induction. Cells were sorted using MoFlo XDP (Beckman) and data were analyzed with FlowJo software. Total RNA from sorted cells was extracted by QIAGEN RNeasy Mini Kit (QIAGEN, 74104) and then reversely transcribed and amplified as previously described.72 Total RNA was extracted from the zebrafish trunk region of spi2+/+, spi2–/– and embryos injected with gfi1aa mRNA at 36 hpf and 4 dpf, spi2 morphants, PBS- and LPS-treated embryos at 36 hpf (20–40 embryos pooled for each sample) with TRIzol reagent (Invitrogen, 15596018) and then reversely transcribed using M-MLV reverse transcriptase (Promega, M1705). The qPCR was performed in a Bio-Rad system (CFX96) and data were analyzed in GraphPad Prism 8.0. The primers used in this study was provided in Supplementary information, Table S10.

scRNA-seq and scATAC-seq

To enrich ECs, HECs and HSPCs, ~40,000 kdrl:mCherry+runx1:GFP cells, ~40,000 kdrl:mCherry+runx1:enGFP+ cells and ~30,000 kdrl:mCherryrunx1:GFP+ cells were sorted from Tg(kdrl:mCherry/runx1:enGFP) at 36 hpf, using MoFlo XDP (Beckman). kdrl:mCherry+runx1:GFP cells, kdrl:mCherry+runx1:enGFP+ cells and kdrl:mCherryrunx1:GFP+ cells were sorted from spi2-KD Tg(kdrl:mCherry/runx1:enGFP) at 36 hpf at the ratio of 4:4:3, which was similar to the control samples in Supplementary information, Fig. S1a. Then, based on cell stock concentration and the 10× Genomics guideline on cell loading, we loaded ~20,000 cells in wild type and ~15,000 cells in spi2 morphants for scRNA-seq, and ~90,000 cells for scATAC-seq. For scRNA-seq, libraries were prepared using Single Cell 3’ Library and Gel Bead Kit V3.1. Sequencing was performed on an Illumina Novaseq6000 platform to generate 150-bp paired-end reads. For ATAC-seq, nuclei were isolated and washed according to the methods supplied by 10× Genomics. Libraries were prepared using the Chromium Chip E Single Cell Kit and Chromium Single Cell ATAC Library & Gel Bead Kit, and further sequenced on an Illumina Novaseq6000 platform to generate 50-bp paired-end reads. scRNA-seq (wild type and spi2 morphants) and scATAC-seq were carried out by CapitalBio Technology Inc.

Processing of scRNA-seq data

The CellRanger (version 4.0.0) was used to generate the gene expression matrix with default parameters. The reference genome is the zebrafish genome danRer11. Then, a set of 10× Genomics-based data (single-cell transcriptomics of 36 hpf zebrafish AGM-derived fluorescence-labeled cells) was further processed in Seurat (version 3.2.1).73 Briefly, cells with more than 500 genes and mitochondrial gene content less than 10% were used to generate Seurat objects. The scRNA-seq datasets of wild type and spi2 morphants were integrated using Canonical Correlation Analysis (CCA)73 and cluster cell identity was assigned by manual annotation based on known markers (Supplementary information, Table S1)24 combined with DEG analysis. We projected EC/HE/HSPC cells of spi2 morphants onto the wild-type dataset using scmap package with default settings.74 Developmental directions of EHT cells were predicted by VECTOR, a tool which identifies the starting cells using both the quantiles of PC values and the connectivity of cells in UMAP.

Unsupervised clustering, batch correction and annotation of scRNA-seq data

The standard sctransform vignette of Seurat was used to perform unsupervised clustering and annotation analysis. Based on the selected top 30 principal components, function “FindClusters” was used to find the cell clusters with 0.6 resolution. Cluster identity was assigned by manual annotation using known marker genes. To identify signature genes of each cell cluster, the “FindAllMarkers” in Seurat was used with default parameters. The UMAP analysis was based on the top 6 principal components. Next, the annotated ECs and HEC/HSPCs were further selected and performed sub-clustering analysis. Low-quality ECs marked by low UMI counts per cell were filtered out. Finally, at 0.5 resolution, six cell types including VEC, AEC, HEC, M-HSPC, L-HSPC and E-HSPC were annotated using known marker genes. HEC were subclustered by “FindClusters” in Seurat at 0.4 resolution.

Processing of scATAC-seq data

Raw fastq files were aligned to the danRer11 reference genome and quantified using Cell Ranger ATAC Software (v.1.2.0). To qualify scATAC-seq, the high-quality cells were kept with the unique fragment number between 1500 and 10,000 and the TSS enrichment value greater than 8. Finally, 3908 cells passed quality control for further analysis. We used the ArchR (v.1.0.1) software to perform scATAC-seq clustering analysis. Briefly, we constructed 500-bp windows tiled across the genome and computed a cell-by-window sparse matrix. Next, we identified the top 3000 accessible windows and performed an LSI dimensionality reduction on these windows. Clusters were identified using Seurat’s SNN graph clustering “FindClusters” with a resolution of 0.3. These same reduced dimensions were used as input to Seurat’s “RunUMAP” with default parameters and plotted in ggplot2 using R. We chose 784 cells containing EC, HE and HSPC clusters to do the subcluster analyses. To accomplish this, 15,000 variable features from the tile matrix were used in the final LSI iteration. Clusters were identified using Seurat’s SNN graph clustering “FindClusters” with a resolution of 0.7. We called peaks on the Tn5-corrected insertions using MACS2 with parameters “–shift -75 –extsize 150 –nomodel –call-summits –nolambda –keep-dup all -q 0.05 -g 1.4e9”. We then created a union peak set containing a total of 123,103 non-redundant peaks. We counted the accessibility for each cell in these peak regions to create an accessibility matrix. Gene activity score matrix was built by using ArchR with default parameters. We then used a list of marker genes to annotate cell types. Global TF activity was then measured by using the ArchR “addDeviationsMatrix” function. The cell type-specific peaks were identified using ArchR “getMarkerFeatures” by wilcoxon test, and the threshold was set as P value < 0.01. Motif enrichment analysis was conducted with cell type-specific peaks by using ArchR “peakAnnoEnrichment”. TFs were defined from the JASPAR2020_CORE_vertebrates_non-redundant databases.

STRT-seq

To perform STRT-seq of spi2+ lineages, 148 mCherry+GFP+ cells and 93 mCherryGFP+ cells were sorted from Tg(kdrl:mCherry/spi2:Gal4;UAS:GFP) at 36 hpf. For library preparation, single cells were sorted into 0.2 mL thin-wall 8 strip PCR tubes (Axygen) containing 2.55 μL lysis buffer with 1.805 μL nuclease-free water (Ambion), 0.5 μL dNTP (10 mM, TAKARA), 0.095 μL Triton X-100 (10%, Sigma-Aldrich), 0.05 μL recombinant ribonuclease inhibitor (40 U/μL, TAKARA) and 0.1 μL barcode primers (10 μM, synthesis by Invitrogen). Barcode primers (10 μM, synthesis by Invitrogen) were anchored with 8 nt UMI sequence to remove reads from PCR duplicates. The processes of reverse transcription, cDNA pre-amplification, biotin tag attachment, cDNA fragmentation, 3’ end of cDNA enrichment and library construction were performed according to the protocol as described.75,76 Finally, the libraries were sequenced on an Illumina NovaSeq6000 platform to generate 150-bp paired-end reads.

Processing of STRT-seq data

For STRT-seq data, raw reads were first de-multiplexed by barcode sequences in read 2 to yield separate read files for individual cells, then the transcript sequences of each cell in read 1 were separated based on corresponding read 2. Simultaneously, UMI sequences in read 2 were integrated into read 1. The template switching oligo sequence, polyA sequence and the low-quality reads (N > 10%) in read 1 were subsequently removed by Python scripts and Trimmomatic (version 0.36).77 Next, the clean reads were aligned to the zebrafish genome (danRer11 from Ensembl) using HISAT2 (version 2.1.0)78 with known gene annotation. Only protein-coding genes were retained and the abundance of each gene was estimated by counting the reads with duplicated UMIs excluded. For quality control on cells, we filtered out low-quality cells with less than 500 genes detected and more than 30% of total reads aligned to mitochondrial genes. Additionally, cells with more than 7000 genes detected and more than 1,000,000 reads mapped to protein-coding genes were considered as doublets and were discarded as well. In total, 222 out of 241 cells were retained. Dimensional reduction and clustering analysis were conducted by using Seurat package (version 3.2.2).73,79 Mitochondrial and ribosomal RNA genes or genes expressed in less than three cells were excluded from further analysis. To remove the effects of UMI counts and cell cycle genes, these two variables were regressed out by running ScaleData. High variable genes were identified by singleCellHaystack package (version 0.3.3)80 and fed to principal component analysis. Then, we selected top 8 principal components for clustering and subsequent UMAP dimension reduction for data visualization by using FindClusters and RunUMAP. Finally, we obtained three clusters that were assigned as early HEC, late HEC and HSPC. The developmental pseudotime of these three clusters were estimated by Monocle 2 (version 2.8.0).81 Gene modules that were significantly positively correlated with HEC were determine by WGCNA package (version 1.70-3).82 Functional enrichment analysis of gene modules was performed by using Metascape (https://metascape.org).83

CUT&Tag

Tg(fli1a-flag-spi2-EGFP) transgenic line was generated and the embryos were collected at 36 hpf. After digesting the embryos into single cell suspension using 0.5% trypsin, ~4000 EGFP+ ECs per sample were sorted and used for cut-tag experiments using Hyperactive® In-Situ ChIP Library Prep Kit for Illumina (Vazyme, TD901). For library amplification, 24 μL of DNA was mixed with 1 μL of TruePrep Amplify Enzyme (TAE, Vazyme), 10 μL of 5× TruePrep Amplify Enzyme buffer and 5 μL of ddH2O, as well as 5 μL of uniquely barcoded i5 and i7 primers from TruePrep Index Kit V2 for Illumina (Vazyme). Libraries were sequenced on an Illumina NovaSeq platform and 150-bp paired-end reads were generated. High-throughput sequencing data was filtered using Cutadapt (version 1.18). Reads were aligned to GRCz11 by Bowtie2 (version 2.3.5.1). Only uniquely mapped reads with mapping quality score ≥ 30 were kept using Samtools (version 1.9) software. After merging replicates, MACS2 (version 2.2.7.1) was used for the peak calling. Signal tracks were generated using the bamCoverage (version 3.5.1, deeptools) function and viewed by the web server of WashU Epigenome Browser.

GSEA and GO enrichment

GSEA (version 4.0.3) was performed to compare gene expression signatures of spi2 morphants versus wild-type (control) cluster cells. The annotated gene sets were selected from the Molecular Signatures Database (MSigDB version 7.5).

GO enrichment analysis was performed on DEGs using Metascape83 or R package clusterProfiler (version 3.10.1)84 with default parameters.

Phylogenetic analysis

Phylogenetic analysis of SPI family members based on the ETS DNA-binding domain. The dendrogram was constructed using Clustal-x 2 for alignment and viewed with MEGA11 using human ELF5 as an outgroup. Bootstrap values were derived from 1000 bootstrap trials.

Quantitation and statistical analysis

GraphPad Prism 6 was used to analyze the statistical data of qPCR, confocal imaging, flow cytometry, and CFU culture. Values are presented as the means ± SD. Student’s unpaired two-tailed t-test was applied for comparisons unless otherwise indicated.