For sample and single-cell library preparation, follow all standard precautions usually taken to perform RNA- and single-cell work, for example, dedicated space, pipettes and reagents, and use of filter tips and nuclease-free reagents.
3.1 Sample Preparation
Grow yeast in media of interest, until density of interest is reached (see Notes 1, 2, and 6).
Freeze 0.5 mL of cell culture mixed with 0.5 mL of ice-cold 50% glycerol at −80 °C until droplet generation using the 10× Genomics Chromium device (see Notes 7 and 8).
3.2 Single-Cell Library Preparation
To accommodate in-droplet spheroplasting and lysis of the yeast cells, a Zymolyase solution is added to the reverse transcriptase mix before running the Chromium device in the 2.0 or 3.0 version of the 10× kit and protocol. For version 3.1, cells take up to 18 min at room temperature to be encapsulated with beads (rather than 6 to 8.5 min for v2.0–3.0), causing premature lysis of cells and high levels of background mRNA. Adding Zymolyase solution to the GEM beads therefore works better for v3.1. This approach may also work for v2.0 and v3.0, but we have not tested this, as older versions are being phased out by 10× and are not readily available.
Thaw cell cultures on ice, measure cell count, and pellet the cells (900 × g, 3′, 4 °C) (see Note 9). Resuspend into ice-cold filter-sterilized PBS to reach the desired cell count, according to the 10× Genomics Chromium Single Cell protocol (see Note 10). In order to keep the proportion of hydrogel beads that contain more than one cell low, it is advisable to aim for 1000 to 2000 recovered cells.
From here on, follow the well-documented protocol of the 10× Genomics Chromium device, except for adding the Zymolyase solution to the single-cell master mix (v2.0 and 3.0) or to the beads (v3.1). To add the Zymolyase solution to the single-cell master mix in v2.0 and 3.0 (which is essentially the reverse transcription master mix), replace 1 μL of water in the master mix with Zymolyase stock solution at the appropriate concentration. For v2.0, the total reaction volume will be 100 μL, and the Zymolyase stock solution should be at 100× (100 mg/mL). For v3.0, the total reaction volume is 80 μL, hence the Zymolyase stock solution should only be 80× (80 mg/mL). Then, prepare the microfluidic Chromium Single Cell Chip
, add the cell suspension to the single-cell master mix and immediately transfer to the microfluidics chip
. Add hydrogel beads and partitioning oil according to the protocol, and commence encapsulating the cells with hydrogel beads using the 10× Genomics Chromium device.
To add the Zymolyase solution to the beads (v3.1, possibly earlier versions), defrost 70 μL aliquots of hydrogel beads, remove 1 μL of beads and add 1 μL of 70× Zymolyase. Then vortex the beads for 30 s, spin down for 5 s and load 50 μL of beads to the appropriate well according to the Chromium v3.1 protocol. Add cell suspension and reverse transcription master mix to the chip
according to the protocol, and commence encapsulating the cells with hydrogel beads.
After emulsification, perform reverse transcription, cleanup, cDNA amplification and library construction (see Note 11) according to the protocol.
After library construction and QC, sequence the prepared libraries (see Note 12).
3.3 Cell Ranger Pipeline
Cell Ranger consists of a set of analysis pipelines developed by 10× Genomics (downloadable from the 10× Genomics website) to process the single-cell RNA-sequencing output (Fig. 1). For some commands, it is useful to set limits on memory and CPU usage. We used a Linux 64 bit machine with 32 GB of RAM and Intel Core i7–4800 CPU to execute the following analysis pipeline.
3.3.1 Demultiplex Raw Base Call (BCL) into FASTQ Files
Create a sample sheet, as an input comma-separated (csv) file, containing the following three columns: the sequencer lane number (“Lane”), the sample name (“Sample”) and the index position (used in the library construction) from the GEM well of the 10× 96-well plate (“Index”) (see Note 13).
Convert raw BCL files from the Illumina sequencer to FASTQ files, separated for each sample and created in the “outs” directory, using “cellranger mkfastq” (see Note 14). You can set the maximal GB the pipeline may request using the “localmem” option.
$ cellranger mkfastq --run=/path/to/folder_with_BCL_files/ \
3.3.2 Mapping to Reference Genome and UMI Counting
Build a custom yeast reference genome package using “cellranger mkref.” Following input files are needed: (1) fasta file containing the genomic DNA sequence for each chromosome, (2) gtf file annotating the feature loci on the reference genome (see Note 15).
$ cellranger mkref --genome=S288C \
-- fasta=/path/to/fastafile.fa \
Make a separate directory per sample to generate single cell feature counts. Using ‘cellranger count’, map the reads to the reference genome, create bam files and carry out UMI (Unique Molecular Identifier, represents absolute number of observed transcripts) counting for each gene (see Notes 16, 17, 18, and 19).
$ cellranger count --id=unique_run_ID_name \
3.3.3 Aggregate Data from Multiple Samples
Create an input csv file which specifies a list of the “cellranger count” output files; containing 1 row per sample, with a sample name (“library_id”) and the path to the output of the “cellranger count” command (“molecule_h5”) (see Note 20).
Aggregate the counts from the different samples using “cellranger aggr” (see Note 21) to create multiple output files, among which the filtered feature-barcode matrices MEX file.
$ cellranger aggr --id=unique_run_ID_string \
3.4 Secondary Analysis in R
There are multiple tools for secondary data analysis, including Monocle, Seurat, Scanpy, and Cell Ranger R. We use the Seurat R package for secondary analysis (see Note 22), the input of which is the output from the Cell Ranger pipeline (Fig. 1).
3.4.1 Read-In of the Data, QC and Filtering, Normalization
Read the output generated in the last step of the Cell Ranger pipeline (more specifically the filtered feature-barcode matrices MEX) into a matrix, using the ‘Read10x’ function of the Seurat package.
data_merge <- Read10X(data.dir='/path/to/filtered_feature_bc_matrix')
This creates a data matrix where each column corresponds to a single cell, denoted by a molecular barcode followed by a numerical index (indicating which of the samples this particular cell comes from), for example, “AAACCTGAGTGATCGG-1”. Each row in this matrix corresponds to a gene based on the exon annotation of the .gtf file used in mkref, for example, “COS7”.
Create a Seurat object from this matrix and assign sample names (based on the numerical index) through a metadata column. We assigned sample names based on the time point of sampling. Therefore, we introduced “timepoint” in the metadata. This can be adapted to the specific setup of the experiment. When creating the Seurat object, it can be useful to prefilter out the barcodes (potential cells) with no reads and with data in less than, for example, 5 cells using “min.cells” and “min.features” respectively.
min_cells = 1
min_features = 5
data_merge_obj <- CreateSeuratObject(counts=data_merge, min.cells=min_cells, min.features=min_features)
email@example.com$timepoint <- 'na'
firstname.lastname@example.org[grepl('-1$',colnames(data_merge)),'timepoint'] <- 'sample_name_1'
email@example.com[grepl('-2$',colnames(data_merge)),'timepoint'] <- 'sample_name_2'
firstname.lastname@example.org[grepl('-3$',colnames(data_merge)),'timepoint'] <- 'sample_name_3'
email@example.com[grepl('-x$',colnames(data_merge)),'timepoint'] <- 'sample_name_x'
Plot and inspect major QC metrics (percentage of mitochondrial reads, possible doublets, etc.) through violin plots of distribution values. Seurat automatically calculates QC metrics (in the metadata: nCount_RNA is number of UMIs
, nFeature_RNA is features per cell). The proportion of mitochondrial reads could be an important metric for downstream analyses and interpretation, but needs to be annotated and calculated manually. In S. cerevisiae, mitochondrial gene names start with ‘Q’, hence we first define all those genes starting with ‘Q’ as mitochondrial, and add this to the metadata (see Note 23).
data_merge_obj[['percent.mt']] <- PercentageFeatureSet(data_merge_obj, pattern = '^Q')
plot1 <- VlnPlot(data_merge_obj, features = c('nFeature_RNA', 'nCount_RNA'), ncol = 2, group.by='timepoint')
The thresholds for filtering out cells can be determined by looking at the violin plots. We provide an example based on previously published data , in which yeast undergoing a shift from glucose to maltose were studied. High numbers of detected genes (nFeature_RNA) could indicate possible doublets. Therefore, the threshold for maximum total number of genes detected was set to 2000 for sample 1 (Fig. 2) (see Note 24). These thresholds can, within reason, be determined per sample.
data_merge_obj <- data_merge_obj[,(grepl('-1$', colnames(data_merge_obj)) & data_merge_obj$nFeature_RNA < 2000) |
(grepl('-2$', colnames(data_merge_obj)) & data_merge_obj$nFeature_RNA < 2000) |
(grepl('-4$', colnames(data_merge_obj)) & data_merge_obj$nFeature_RNA < 1500) |
(grepl('-3$', colnames(data_merge_obj)) & data_merge_obj$nFeature_RNA < 1500) |
(grepl('-5$', colnames(data_merge_obj)) & data_merge_obj$nFeature_RNA < 2000)]
Normalize the data using the default global-scaling “LogNormalize” option, to correct for differences in capture efficiency between cells and differences in read depth between samples (see Note 25).
data_merge_obj <- NormalizeData(data_merge_obj)
Detect highly variable genes across the single cells using “FindVariableFeatures,” using the default selection method option “vst.” The selection method defines how to choose the top variable features. You can decide on “nfeatures” depending on how many variable genes you want to use to cluster the cells. The downstream analysis will focus on these genes which will help highlight biological signals in the data.
data_merge_obj <- FindVariableFeatures(data_merge_obj, selection.method = 'vst', nfeatures = 300)
To improve downstream dimensionality reduction and clustering, we need to scale (linearly transform) the data, in order to prevent highly expressed genes from dominating the next steps of the analysis (see Note 26).
data_merge_obj <- ScaleData(data_merge_obj, verbose = FALSE)
3.4.2 Dimensional Reduction and Visual Exploration of Data
Run linear dimensional reduction using “RunPCA” on the scaled data (see Note 27).
data_merge_obj <- RunPCA(data_merge_obj, npcs = 20, verbose = FALSE)
Run UMAP on the dimensionally reduced data, and visualize using “DimPlot” (Fig. 3). You can also visualize the data with cells colored by a quantitative feature (such as expression levels of specific genes/growth markers) using “FeaturePlot” (Fig. 4).
data_merge_obj <- RunUMAP(data_merge_obj, reduction = 'pca', dims = 1:20 , n.epochs = 1000)
DimPlot(data_merge_obj, reduction = 'umap', group.by = 'timepoint', pt.size = 0.1)
FeaturePlot(data_merge_obj, c('GENE_NAME1','GENE_NAME2','GENE_NAMEX'), pt.size = 0.5, cols = c('gray50','red'), reduction= 'umap')
3.4.3 Clustering and Differential Gene Expression
Cluster the cells using a graph-based approach. This method first calculates the k-nearest neighbors, then constructs the shared nearest neighbors’ graph. The resolution parameter (see Note 28) determines the number of clusters (Fig. 5).
data_merge_obj <- FindNeighbors(data_merge_obj, dims = 1:20)
data_merge_obj <- FindClusters(data_merge_obj, resolution = 0.1)
DimPlot(data_merge_obj, pt.size = 0.1)
Find specific biomarkers that define clusters via differential expression. The parameter “logfc.threshold” is used to limit the testing to genes which show a specific fold-difference between the groups of cells. You can test for markers of a specific cluster (specified in ident.1) or you can find markers distinguishing clusters from each other (e.g., ident.1 = 2, ident.2 = 3 in this example will find markers differing between cluster 2 and cluster 3). You can further limit the list of markers by extracting only genes with specific average log-fold changes or adjusted p-values you define. Export as a csv-file.
diff_exp_res <- FindMarkers(data_merge_obj, ident.1 = 2, ident.2 = 3, logfc.threshold = 0.01)
diff_exp_res <- diff_exp_res[abs(diff_exp_res$avg_logFC)>0.25 & diff_exp_res$p_val_adj<0.05, ]
write.csv(diff_exp_res, 'diff_exp_res.csv', row.names = F)
Upload this csv-file to a Gene Ontology webtool, such as Gorilla (http://cbl-gorilla.cs.technion.ac.il/) to identify and visualize enriched GO terms in your gene list.