Recent advances in microfluidics and cDNA barcoding have led to a dramatic increase in the throughput of single-cell RNA-Seq (scRNA-Seq) [1,2,3,4,5]. However, unlike earlier or less scalable techniques [6,7,8], these new tools do not offer a straightforward way to directly link phenotypic information obtained from individual, live cells to their expression profiles. Nonetheless, microwell-based implementations of scRNA-Seq are compatible with a wide variety of phenotypic measurements including live cell imaging, immunofluorescence, and protein secretion assays [3, 9,10,11,12]. These methods involve co-encapsulation of individual cells and barcoded mRNA capture beads in arrays of microfabricated chambers. Because the barcoded beads are randomly distributed into microwells, one cannot directly link phenotypes measured in the microwells to their corresponding expression profiles. In Single Cell Optical Phenotyping and Expression Sequencing or SCOPE-Seq, we use optically barcoded beads [13] and identify the sequencing barcode associated with each single-cell cDNA library on the sequencer by fluorescence microscopy. Thus, we can obtain images, movies, or other phenotypic data from individual cells by microscopy and directly link this information to genome-wide expression profiles.

We previously demonstrated the compatibility of the commercially available “Drop-seq” beads [1] with our microwell array system for scRNA-Seq [9, 14]. Here, we generate optically barcoded mRNA capture beads for our microwell array system from “Drop-seq” beads. These beads are conjugated to oligonucleotides with a cell-identifying sequencing barcode and a 3′-poly(dT) terminus. To enable optical identification of the sequencing barcode on each bead, we attach a unique combination of oligonucleotides selected from a set of 12 in two cycles of split-pool ligation (Fig. 1a, left). We refer to this combination as an optical barcode, because it can be decoded by sequential fluorescence hybridization, to distinguish it from the sequencing barcode. Immediately prior to ligation, we sonicate the beads to detach a small fraction of oligonucleotides, from which we generate a “bead-free” sequencing library (Additional file 1: Supplementary Methods, Fig. 1a, right). Split-pool, single-stranded ligation produces a final pool of beads with both optical and sequencing barcodes (dual-barcoded beads, Fig. 1b). Sequencing the bead-free library associated with each ligation reaction produces a look-up table linking the optical and sequencing barcodes for each bead in the pool (Fig. 1c). Importantly, bead-free library construction and sequencing is done once for each batch of optically barcoded beads, which can be used for many SCOPE-Seq experiments.

Fig. 1
figure 1

a Workflow for generating the dual-barcoded beads and the associated bead-free DNA sequencing libraries. b Sequence composition of the bead-bound mRNA capture oligonucleotides and optical barcode oligonucleotides (OBOs). c A look-up table linking sequence barcodes and optical barcodes. d Workflow for SCOPE-Seq. e Workflow for linking live, single-cell imaging data with single-cell RNA-Seq profile

In SCOPE-Seq, we load cells into a microwell array as described previously [3, 9], collect live cell imaging data, and co-encapsulate the cells with dual-barcoded beads (Fig. 1d). The microwell array device used here has a total of 30,500 microwells (see Additional files 2 and 3 for design of the microwell array device). We then use a computer-controlled system to perform on-chip cell lysis, mRNA capture, reverse transcription, and exonuclease digestion. This process generates PCR-amplifiable, sequence barcoded cDNA that is covalently attached to each bead. Next, we perform “optical demultiplexing”—12 cycles of reversible fluorescence hybridization that determine the combination of 12 optical barcode oligonucleotides (OBOs) on each bead (Additional file 1: Supplementary Methods, Fig. 1d, e). We then cut the microwell array into multiple pieces and extract the beads from each piece for scRNA-Seq library construction. Beads from each piece are processed and indexed separately, thereby increasing our multiplexing capacity to 212× N where N = 10 is the number of pieces, giving us an effective barcode library size of 40,960. The resulting scRNA-Seq libraries from different pieces are then pooled and sequenced to obtain the RNA-Seq profile of the individual cells and their sequencing barcodes. Finally, we process the images to identify the optical barcode on each bead and identify the corresponding sequencing barcode using the look-up table described above to link microscopy data from a cell in a particular microwell to its RNA-Seq profile (Fig. 1e).

To characterize the performance and demonstrate the utility of SCOPE-Seq, we performed a mixed-species experiment where human (U87) and mouse (3T3) cells are labeled with differently colored live stain dyes, mixed together, and analyzed with SCOPE-Seq. Differential labeling allows us to determine the species of each captured cell by live cell imaging in the microwells. We can independently determine the species associated with each sequencing barcode using the corresponding scRNA-Seq data, from which we detected ~ 3600 unique transcripts per cell on average. We considered any inconsistencies between these two independent species identifications to result from optical barcode linkage error. We define the linking accuracy as the fraction of single-cell imaging and RNA-Seq data sets that are correctly linked among all linked data sets. Figure 2a shows a scatter plot of the number of reads aligning uniquely to the human and murine transcriptomes for each cell that was linked to imaging data. Importantly, the data points derived from scRNA-Seq are colored using the actual two-channel fluorescence data from live cell microscopy. Species calling was 98.6% concordant between the imaging and scRNA-Seq data, and after correcting for the difference in abundance of human and murine cells, we obtained an imaging-to-sequencing linking accuracy of 96.1% for singlets from which we can unambiguously obtain a species call. From a total of 2352 RNA-Seq expression profiles, 1133 of them were linked to imaging data (48.2%). The majority of unlinked cells are due to optical demultiplexing errors which usually lead to a failure to link rather than an error. Because the yield of linked optical and sequencing barcodes is high (89.6%), we know that these errors arise from the optical demultiplexing process itself.

Fig. 2
figure 2

a Scatter plot of the number of uniquely aligned human and murine reads for each linked sequencing barcode colored by the relative fluorescence intensity of the human and murine labels before and b after imaging-based multiplet removal. c Principal component analysis of the 19 fluorescence imaging features for the murine cells. d Correlation between the first PC and each imaging feature. e Correlation between the second PC and each imaging feature. f Normalized enrichment scores for MSigDB gene sets enriched in genes that are correlated with the first (purple) and second (magenta) PCs

The microwell array contains a small number of multi-species multiplets (Fig. 2a). As expected from the high linking accuracy of SCOPE-Seq, the species purity of RNA-Seq expression profiles are consistent with their fluorescent labels from live cell imaging data (Fig. 2a). Interestingly, we were able to remove most of the multi-species multiplets using monochrome live cell imaging data (Additional file 1: Supplementary Methods, Fig. 2b). We used scRNA-Seq profiles with purity below 70% as a threshold for calling human-mouse multiplets. We then blinded ourselves to the two-color fluorescence information that distinguishes human-mouse multiplets in our imaging data and attempted to identify multiplets by manually examining the monochrome images. Imaging- and sequencing-based doublet identifications were conducted by different researchers in a blinded fashion. This resulted in a sensitivity of 66% and a specificity of 99.1% for multiplet detection, and a concordance of 87.5% between one- and two-color imaging. Some false negatives likely arise from imperfections in our scRNA-Seq “ground truth” and relatively low-resolution imaging. We anticipate that more sophisticated image processing and better microscopy and cell/nucleus stains will lead to further improvements in sensitivity to make SCOPE-Seq highly effective for detecting multiplets.

Additional imaging features obtained from cells may carry information related to gene expression. We measured 19 imaging features from the fluorescence images of cells reflecting various aspects of cell size, shape, and intensity distribution (Additional file 1: Supplementary Methods). Principal component (PC) analysis on these features suggests significant heterogeneity among the cells (Fig. 2c). The first PC is primarily correlated with cell size-related features (Fig. 2d) and the second with shape-related features (Fig. 2e). We then ranked protein coding genes by their correlation with these PCs and performed gene set enrichment analysis for each PC (Fig. 2f). Perhaps not surprisingly, we found that cell division-related genes are most correlated with the first PC, which is related to cell size, likely because actively dividing cells and mitotic figures tend to be larger. We found enrichment of extracellular matrix vacuolar genes for the second PC, which represents a measure of cell shape and may result from cells in the process of adhering to the microwells and the impact of vacuoles on overall cell morphology. We made very similar observations of associations between gene ontology and imaging features using partial least squares regression analysis on the same data (Additional file 1: Supplementary Methods, Additional file 4: Table S1).

Many cellular phenotypes are difficult to infer directly from static measurements of the transcriptome, such as protein expression and localization dynamics, organelle dynamics and distribution, morphological features, uptake of foreign objects, and biomolecular secretion. However, there are myriad live cell imaging and microwell-based assays for characterizing these phenotypes in individual cells. We expect that SCOPE-Seq will serve as a highly scalable, accurate, and economical approach to linking live cell microscopy assays to scRNA-Seq, enabling investigation of the transcriptional underpinnings of the resulting phenotypes. In addition, with the multiplet detection capability, one can potentially achieve approximately fivefold higher throughput by more aggressive cell loading, although this is subject to the caveat that background from molecular cross-talk could increase.

The current implementation of SCOPE-Seq has important limitations, motivating future improvements. The use of separate optical and sequencing barcodes requires an extra step-sequencing of bead-free libraries. Future implementations will use oligonucleotides in which the optical and sequencing barcodes are the same DNA sequence. OBO ligation to a subset of mRNA capture sites on the beads also leads to reduced molecular capture efficiency for the corresponding scRNA-Seq libraries relative to beads without optical barcodes (Additional file 1: Figure S1). Finally, the multiplexing capacity of the beads in our proof-of-concept experiment is relatively modest, which precludes an error-correcting code. While our current linking accuracy is high, both the yield of optically demultiplexed cells and accuracy could be improved with such a code.

The scheme presented here for linking cellular phenotypes to sequencing could also be generalized beyond single-cell analysis. Microscopy experiments on small multi-cellular organisms, organoids, or colonies could be linked with sequencing. Additionally, the optically decodable bead array could be used for spatial transcriptomic analysis as demonstrated previously with printed microarrays [15]. One advantage of optically decodable beads over printed microarrays [15, 16] is that beads can be prepared in a large batch for use in many experiments starting from commercially available “Drop-seq” beads with relatively simple tube-based reactions. We hope that this economical approach will serve as a powerful tool for connecting high-throughput microscopy and sequencing on multiple scales.