Overview of scRNA-seq data of enriched dendritic cells from human cord blood
DCs contain heterogenous populations with diverse surface markers and functions (Yin et al.2017; Yu et al.2015; Villani et al.2017). In both adult and neonatal blood mononuclear cells, DC frequencies are low (~1%). To acquire enriched DCs from cord blood mononuclear cells (CBMCs), we labeled T cells, B cells, NK cells and monocytes with the lineages-specific markers CD3, CD19, CD16 and CD14, respectively, followed by magnetic bead depletion. Then, we applied single-cell RNA sequencing on all lineage-negative cells and performed clustering analysis and dimensional reduction.
Within the 7004 single cells which passed quality control, eight clusters (C1–C8) were identified (Fig. 1A). We mapped each cluster to known immune cell types according to a set of lineage-specific markers (Butler et al.2018) and the genes highly expressed in each cluster (Fig. 1B, and supplementary Fig. S1A, Table S1). C1, C2 and C3 cells are considered DCs that highly express HLA-DQA1 and GPR183, and these three clusters make up 61.1% of all cells. C4 cells, which express high levels of NKG5 and NKG7 (GNLY), are natural killer (NK) cells. C5, C6 and C7 cells are diverse types of progenitor cells, and they express CD34 at different levels (C5 > C6 > C7). These three clusters make up 31.3% of all the cells. Interestingly, C5 cells express FLT3, which is essential for DC development (Waskow et al.2008), suggesting that C5 may be the progenitor of DCs. Cells in C8 are erythrocytes according to their high-level expression of hemoglobin genes, such as HBA1, HBA2 and HBB (supplementary Fig. S1B).
Currently, t-distributed stochastic neighbor embedding (t-SNE) is the most widely used method to preform dimensional reduction in single-cell analysis; this method can help to reveal local data structure in mass cytometry or single-cell transcriptomic data (Amir et al.2013). A new algorithm with similar functionality called uniform manifold approximation and projection (UMAP) was developed recently (McInnes et al.2018) and applied to biological data. UMAP has outperformed t-SNE by generating more meaningful organization of cell clusters (Becht et al.2018). To reveal whether this advantage applies to our data, we compared the UMAP output to principal component analysis (PCA, which is a well-known dimensional reduction method widely used in microarray and bulk RNA-seq analysis) and t-SNE. In the UMAP output, cell clusters identified as dendritic cells (C1, C2, and C3) grouped together in the plot, as did cell clusters identified as progenitor cells (C5, C6, and C7) (Fig. 1A). The aggregation of DC clusters and progenitor cell clusters was not observed in the PCA or t-SNE outputs (supplementary Fig. S2A, S2B). In the PCA output, different cell clusters were not well separated. In the t-SNE output, although cluster structure was clearly present, DC clusters scattered in the plot and were interwoven with progenitor cell clusters. Thus, we choose UMAP for plots for the entire analysis over t-SNE.
In droplet-based sequencing methods such as 10x, it is possible that two or more cells are captured in one droplet, thereby sharing the same cell barcode and acting like one cell during the analysis step (Zheng et al.2017). These cells are known as doublets (or multiplets), and they are minimally distinguishable from normal cells by looking at their transcriptome alone; however, they can be predicted by scanning the entire single-cell RNA-seq dataset (DePasquale et al.2018; McGinnis et al.2018; Wolock et al.2018). According to the 10x Genomics Single Cell 3′ Reagent Kits version 2 user guide, the doublet rate of about 7000 cells is approximately 5.4%. Therefore, the single-cell dataset analyzed here contains approximately 400 doublets. We predicted the identities of doublets using a newly developed method (McGinnis et al.2018) and found that the distribution of doublets did not severely skew the clustering results (supplementary Fig. S2C). However, the identities of doublets are important reference criteria during more detailed clustering analysis as follows.
Identification of known DC subtypes in scRNA-seq data
Of the three dendritic cell clusters (C1, C2, and C3), C1 and C2 express all plasmacytoid dendritic cell signature genes (e.g., GZMB and JCHAIN), while C3 shows apparent heterogeneity in DC signature gene expression (e.g., CLEC9A for cDC1 and CD1C for cDC2). It would be interesting to determine how the choice of clustering parameter influences the number of clusters in C3. The clustering parameter used in our clustering analysis pipeline (Louvain community detection method employed by the Seurat package) is called “resolution”, and it controls number of clusters indirectly; that means, as the resolution value gradually increases, new subclusters emerge and the total number of clusters increases (supplementary Fig. S3A, S3B). By calculating the differentially expressed genes between newly emerged clusters at every breaching event, we determined the optimal “resolution” value. Finally, C3 was divided into four distinct clusters (named as C3-0, C3-1, C3-2 and C3-3) (Fig. 2A, and supplementary Table S2).
All the dendritic cell subtypes previously identified in adult peripheral blood (Villani et al.2017; Yin et al.2017; Yu et al.2015) were found in cord blood: C3-3 is cDC1s, C3-0 and C3-1 are cDC2s, C3-2 is AXL+ DCs, and C1 and C2 are pDCs (Fig. 2B). When comparing C3-0 and C3-1, we observed that C3-1 highly expressed signature genes of CD5low cDC2s (Fig. 2B, and supplementary Fig. S2C), and the differentially expressed genes between C3-0 and C3-1 maintained their relative expression levels in bulk RNA-seq data of CD5high cDC2s and CD5low cDC2s (Yin et al.2017) (Fig. 2C, D). Although only a small portion of C3-0 cells express CD5 (7.8% in C3-0, and 3.4% in C3-1), the mean expression value of CD5 in C3-0 is higher than that in C3-1 (supplementary Fig. S3C). Therefore, we concluded that C3-0 is CD5high cDC2s, whereas C3-1 is CD5low cDC2s.
By aligning cord blood DCs and adult peripheral blood DCs together using canonical correlation analysis (CCA) (Butler et al.2018), we found that DC subtypes in cord blood matched well with DC subtypes in adult peripheral blood (supplementary Fig. S4A, S4B). In the meanwhile, DC subtypes in cord blood and adult peripheral blood shared similar signature genes (supplementary Fig. S4C, Table S3). Interestingly, AXL+ DC signature genes in cord blood are the subset of AXL+ DC signature genes in adult peripheral blood. It means some genes in AXL+ DCs (include CX3CR1 and CD5) are gradually upregulated during development. Besides, as cluster C3 cells were originally identified as cDCs, AXL+ DCs in cord blood (C3-2) are much closer to cDCs than pDCs. This relationship was also revealed by hierarchical clustering, which showed that AXL+ DCs in cord blood (C3-2) are first clustered with cDC2s and then with cDC1s (Fig. 2E).
A potentially new DC subtype with the megakaryocyte gene expression profile
In contrast to peripheral blood in which pDCs form homogeneous populations (Villani et al.2017), cord blood contains two distinct clusters of pDCs: C1 and C2. They both express pDC-specific genes such as GZMB and JCAHIN (IGJ); however, C2 also expresses megakaryocyte-specific genes such as PPBP (pro-platelet basic protein) and PF4 (platelet factor 4) (Fig. 1B, 3A, and supplementary Table S4). This indicates that cells in C1 can be matched to standard pDCs, while cells in C2 have characteristics of both pDCs and megakaryocytes. By applying the clustering method based on differential expression analysis, we found that cluster C1 cannot be further divided, while cells in cluster C2 are still further separable (supplementary Fig. S5A).
To reveal the relationship between cluster C2 and pDCs/megakaryocytes, we used a public single-cell dataset of 33,000 peripheral blood mononuclear cells (PBMCs) which contains clusters of pDCs and megakaryocytes. To build a high-quality reference, we developed a method based on differential expression and correlation analysis to iteratively remove low-quality cells. After this cleaning step, 109 megakaryocytes and 103 pDCs were retained and then pooled together as a reference for differentially expressed genes between these two groups (supplementary Table S5). By calculating the Spearman correlation coefficients between cells in C1/C2 and the megakaryocyte/pDC reference, we found significant differences in cluster C1 and C2. All cells in C1 correlate well with pDCs, while cells in C2 form a transitional population spread between pDCs and megakaryocytes (Fig. 3B, and supplementary Fig. S5B). The diversity of cluster C2 was also revealed by the expression levels of pDC/megakaryocyte signature genes (Fig. 3C, D) and by principal component analysis (supplementary Fig. S1A). Although cluster C2 can be further divided into subgroups, we observed that the differentially expressed genes at every branching event in the clustering tree overlap with megakaryocyte signature genes (supplementary Fig. S5C). This indicates that the diversity of cluster C2 is dominated by megakaryocyte signatures.
We have several conjectures about the origin of cluster C2. One hypothesis is that these cells are “generated” during the sequencing and computational analysis steps when some megakaryocytes and pDCs form doublets (Zheng et al.2017); however, the total doublet number in the dataset (~400) is much lower than the cluster size of C2 (794). Further, if most cells in C2 are doublets between pDCs and megakaryocytes, the number of true megakaryocytes should be much larger. For further verification of the existence of C2 cells, we used flow cytometry to confirm the existence of a pDC subpopulation which expresses C2 signature genes at the cell surface. CD42a (GP9) and CLEC1B are two genes that C2 expresses at higher levels than C1, and flow cytometry analysis showed that the Lin−HLA-DR+CD1C−CD123+ pDCs indeed contain a CD42a+ or CLEC1B+ subpopulation (Fig. 4A). Compared with pDCs and cDC2s, the CLEC1B+ pDCs express higher mRNA levels of PPBP and PF4, which corresponds well with the single-cell RNA-seq data (Fig. 4B). Thus, the expressions of megakaryocyte-specific genes in C2 were confirmed at both the RNA level by PCR and the protein level by fluorescence-activated cell sorting (FACS). These experiments show that cells in cluster C2 do exist before sequencing.
Dendritic cells in the NK cells cluster
Cell heterogeneity can be observed in the heatmap of cluster signature genes (Fig. 1B). For example, a subgroup of cells in cluster C3 express signature genes of cluster C1 (pDCs), and these cells include AXL+ DC and doublets between pDCs and cDCs (Fig. S2C). Cluster C4 (NK cells) is also a heterogeneous population according to the expression of cluster C1 signature genes. To reveal whether cluster C4 contains DC-like cells, we performed the clustering analysis method previously used in C1 and C3. We found that cluster C4 can be further divided into six subpopulations (Fig. 5A).
These six subclusters (named C4-0 to C4-5) differ significantly in pDC/NK cell signature genes. Both cluster C4-1 and C4-2 express high levels of the pDC signature genes GZMB and JCHAIN, while cluster C4-2 and C4-3 express low levels of the NK signature genes NKG7 and GNLY (Fig. 5B). To this extent, cells in cluster C4-2 are closely related to pDCs, and cells in cluster C4-1 might be doublets between pDCs and NK cells, which are supported by doublet prediction (supplementary Fig. S2C). To better explain the relationship of these subpopulations to pDCs or NK cells, we selected pDCs and NK cells from the same 33,000 PBMC single-cell dataset to build a high-quality reference (107 pDCs and 1549 NK cells were used) (supplementary Table S6). Through the use of correlation analysis, we found that cells in cluster C4-2 were more closely correlated with pDCs compared to the other C4 subpopulations (Fig. 5C). However, differential expression analysis revealed that all genes with higher expression in C4-2 cells than in C1 also had lower expression than in the C4-3 group (supplementary Fig. S6A, Table S7). One of the genes that C4-2 and C4-3 cells express at higher levels than C1 is CD3D, which is a T cell signature gene; however, C4-3 cells do not express mature T cell markers such as CD4 or CD8 (supplementary Fig. S6B). NK cells and T cells are phenotypically similar and often cluster together in single-cell analysis (Butler et al.2018; Oetjen et al.2018). Therefore, whether the C4-2 cluster is a true cell type or a mixture of pDCs and cells in C4-3 still needs to be verified.