Skip to main content
Log in

Prokaryotic single-cell RNA sequencing by in situ combinatorial indexing

  • Letter
  • Published:

From Nature Microbiology

View current issue Submit your manuscript

Abstract

Despite longstanding appreciation of gene expression heterogeneity in isogenic bacterial populations, affordable and scalable technologies for studying single bacterial cells have been limited. Although single-cell RNA sequencing (scRNA-seq) has revolutionized studies of transcriptional heterogeneity in diverse eukaryotic systems1,2,3,4,5,6,7,8,9,10,11,12,13, the application of scRNA-seq to prokaryotes has been hindered by their extremely low mRNA abundance14,15,16, lack of mRNA polyadenylation and thick cell walls17. Here, we present prokaryotic expression profiling by tagging RNA in situ and sequencing (PETRI-seq)—a low-cost, high-throughput prokaryotic scRNA-seq pipeline that overcomes these technical obstacles. PETRI-seq uses in situ combinatorial indexing11,12,18 to barcode transcripts from tens of thousands of cells in a single experiment. PETRI-seq captures single-cell transcriptomes of Gram-negative and Gram-positive bacteria with high purity and low bias, with median capture rates of more than 200 mRNAs per cell for exponentially growing Escherichia coli. These characteristics enable robust discrimination of cell states corresponding to different phases of growth. When applied to wild-type Staphylococcus aureus, PETRI-seq revealed a rare subpopulation of cells undergoing prophage induction. We anticipate that PETRI-seq will have broad utility in defining single-cell states and their dynamics in complex microbial communities.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1: Overview of PETRI-seq.
Fig. 2: PETRI-seq captures transcriptomes of single E. coli and S. aureus cells with high purity and low bias.
Fig. 3: PCA distinguishes between exponential- and stationary-phase single E. coli cells through mRNA expression patterns.

Similar content being viewed by others

Data availability

Raw data have been submitted to the Gene Expression Omnibus under accession number GSE141018. Source data are also provided for all figures. All of the figures except for Fig. 1 include original data. An overview of all of the experiments is provided in Supplementary Table 4. A count matrix for the three primary PETRI-seq experiments is provided in Supplementary Table 6.

Code availability

Relevant code for this manuscript is available from the corresponding author on request; current PETRI-seq code and protocols are available at https://tavazoielab.c2b2.columbia.edu/PETRI-seq/.

References

  1. Tang, F. et al. mRNA-seq whole-transcriptome analysis of a single cell. Nat. Methods 6, 377–382 (2009).

    Article  CAS  PubMed  Google Scholar 

  2. Ramsköld, D. et al. Full-length mRNA-seq from single-cell levels of RNA and individual circulating tumor cells. Nat. Biotechnol. 30, 777–782 (2012).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  3. Picelli, S. et al. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat. Methods 10, 1096–1098 (2013).

    Article  CAS  PubMed  Google Scholar 

  4. Fan, H. C., Fu, G. K. & Fodor, S. P. A. Expression profiling. Combinatorial labeling of single cells for gene expression cytometry. Science 347, 1258367 (2015).

    Article  PubMed  CAS  Google Scholar 

  5. Macosko, E. Z. et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell 161, 1202–1214 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Klein, A. M. et al. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell 161, 1187–1201 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Bose, S. et al. Scalable microfluidics for single-cell RNA printing and sequencing. Genome Biol. 16, 120 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  8. Zheng, G. X. et al. Massively parallel digital transcriptional profiling of single cells. Nat. Commun. 8, 14049 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Picelli, S. Single-cell RNA-sequencing: the future of genome biology is now. RNA Biol. 14, 637–650 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Sheng, K., Cao, W., Niu, Y., Deng, Q. & Zong, C. Effective detection of variation in single-cell transcriptomes using MATQ-seq. Nat. Methods 14, 267–270 (2017).

    Article  CAS  PubMed  Google Scholar 

  11. Cao, J. et al. Comprehensive single-cell transcriptional profiling of a multicellular organism. Science 357, 661–667 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Rosenberg, A. B. et al. Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding. Science 360, 176–182 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Cao, J. et al. The single-cell transcriptional landscape of mammalian organogenesis. Nature 566, 496–502 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  14. Taniguchi, Y. et al. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science 329, 533–538 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Bartholomäus, A. et al. Bacteria differently regulate mRNA abundance to specifically respond to various stresses. Philos. Trans. R. Soc. A 374, 20150069 (2016).

  16. Moran, M. A. et al. Sizing up metatranscriptomics. Isme J. 7, 237–243 (2013).

    Article  CAS  PubMed  Google Scholar 

  17. de Lange, N., Tran, T. M. & Abate, A. R. Electrical lysis of cells for detergent-free droplet assays. Biomicrofluidics 10, 024114 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  18. Amini, S. et al. Haplotype-resolved whole-genome sequencing by contiguity-preserving transposition and combinatorial indexing. Nat. Genet. 46, 1343–1349 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Hodson, R. E., Dustman, W. A., Garg, R. P. & Moran, M. A. In situ PCR for visualization of microscale distribution of specific genes and gene products in prokaryotic communities. Appl. Environ. Microbiol. 61, 4074–4082 (1995).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Bloom, J. D. Estimating the frequency of multiplets in single-cell RNA sequencing from cell-mixing experiments. PeerJ 6, e5578 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Okayama, H. & Berg, P. High-efficiency cloning of full-length cDNA. Mol. Cell. Biol. 2, 161–170 (1982).

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Kivioja, T. et al. Counting absolute number of molecules using unique molecular identifiers. Nat. Methods 9, 72–74 (2012).

    Article  CAS  Google Scholar 

  23. Yang, S. et al. Decontamination of ambient RNA in single-cell RNA-seq with DecontX. Genome Biol. 21, 57 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  24. Young, M. D. & Behjati, S. SoupX removes ambient RNA contamination from droplet based single-cell RNA sequencing data. Preprint at bioRxiv https://doi.org/10.1101/303727 (2020).

  25. Hotelling, H. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24, 417–441 (1933).

    Article  Google Scholar 

  26. Hafemeister, C. & Satija, R. Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression. Genome Biol. 20, 296 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Gentry, D. R., Hernandez, V. J., Nguyen, L. H., Jensen, D. B. & Cashel, M. Synthesis of the stationary-phase sigma factor σs is positively regulated by ppGpp. J. Bacteriol. 175, 7982–7989 (1993).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Almirón, M., Link, A. J., Furlong, D. & Kolter, R. A novel DNA-binding protein with regulatory and protective roles in starved Escherichia coli. Genes Dev. 6, 2646–2654 (1992).

    Article  PubMed  Google Scholar 

  29. Traxler, M. F. et al. The global, ppGpp-mediated stringent response to amino acid starvation in Escherichia coli. Mol. Microbiol. 68, 1128–1148 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Chen, H., Shiroguchi, K., Ge, H. & Xie, X. S. Genome-wide study of mRNA degradation and transcript elongation in Escherichia coli. Mol. Syst. Biol. 11, 781 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  31. Vargas-Garcia, C. A., Ghusinga, K. J. & Singh, A. Cell size control and gene expression homeostasis in single-cells. Curr. Opin. Syst. Biol. 8, 109–116 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Diep, B. A. et al. Complete genome sequence of USA300, an epidemic clone of community-acquired meticillin-resistant Staphylococcus aureus. Lancet 367, 731–739 (2006).

    Article  CAS  PubMed  Google Scholar 

  33. Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J. & Wheeler, D. L. GenBank. Nucleic Acids Res. 35, D21–D25 (2007).

    Article  CAS  PubMed  Google Scholar 

  34. Saint, M. et al. Single-cell imaging and RNA sequencing reveal patterns of gene expression heterogeneity during fission yeast growth and adaptation. Nat. Microbiol. 4, 480–491 (2019).

    Article  CAS  PubMed  Google Scholar 

  35. Grün, L., Kester, L. & Oudenaarden, A. Validation of noise models for single-cell transcriptomics. Nat. Methods 11, 637–640 (2014).

    Article  PubMed  CAS  Google Scholar 

  36. Raj, A., van den Bogaard, P., Rifkin, S. A., van den Oudenaarden, A. & Tyagi, S. Imaging individual mRNA molecules using multiple singly labeled probes. Nat. Methods 5, 877–879 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Abraham, J. M., Freitag, C. S., Clements, J. R. & Eisenstein, B. I. An invertible element of DNA controls phase variation of type 1 fimbriae of Escherichia coli. Proc. Natl Acad. Sci. USA 82, 5724–5727 (1985).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Deutsch, D. R. et al. Extra-chromosomal DNA sequencing reveals episomal prophages capable of impacting virulence factor expression in Staphylococcus aureus. Front. Microbiol. 9, 1406 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Balasubramanian, S., Osburne, M. S., BrinJones, H., Tai, A. K. & Leong, J. M. Prophage induction, but not production of phage particles, is required for lethal disease in a microbiome-replete murine model of enterohemorrhagic E. coli infection. Plos Pathog. 15, e1007494 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  40. Blattman, S. B., Jiang, W., Oikonomou, P. & Tavazoie, S. Prokaryotic single-cell RNA sequencing by in situ combinatorial indexing. Preprint at bioRxiv https://doi.org/10.1101/866244 (2019).

  41. Kuchina, A. et al. Microbial single-cell RNA sequencing by split-pool barcoding. Preprint at bioRxiv https://doi.org/10.1101/869248 (2019).

  42. Brauner, A., Fridman, O., Gefen, O. & Balaban, N. Q. Distinguishing between resistance, tolerance and persistence to antibiotic treatment. Nat. Rev. Microbiol. 14, 320–330 (2016).

    Article  CAS  PubMed  Google Scholar 

  43. Girgis, H. S., Harris, K. & Tavazoie, S. Large mutational target size for rapid emergence of bacterial persistence. Proc. Natl Acad. Sci. USA 109, 12740–12745 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Franzosa, E. A. et al. Sequencing and beyond: integrating molecular ‘omics’ for microbial community profiling. Nat. Rev. Microbiol. 13, 360–372 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Lee, T. S. et al. BglBrick vectors and datasheets: a synthetic biology platform for gene expression. J. Biol. Eng. 5, 12 (2011).

  46. Zaslaver, A. et al. A comprehensive library of fluorescent transcriptional reporters for Escherichia coli. Nat. Methods 3, 623–628 (2006).

    Article  CAS  PubMed  Google Scholar 

  47. Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBNet 17, 10–12 (2011).

    Article  Google Scholar 

  48. Smith, T., Heger, A. & Sudbery, I. UMI-tools: modelling sequencing errors in unique molecular Identifiers to improve quantification accuracy. Genome Res. 27, 491–499 (2017).

  49. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).

    Article  CAS  PubMed  Google Scholar 

  51. Santos-Zavaleta, A. et al. RegulonDB v 10.5: tackling challenges to unify classic and high throughput knowledge of gene regulation in E. coli K-12. Nucleic Acids Res. 47, D212–D220 (2019).

    Article  CAS  PubMed  Google Scholar 

  52. Taboada, B., Ciria, R., Martinez-Guerrero, C. E. & Merino, E. ProOpDB: Prokaryotic Operon DataBase. Nucleic Acids Res. 40, D627–D631 (2012).

    Article  CAS  PubMed  Google Scholar 

  53. Fu, G. K., Hu, J., Wang, P. & Fodor, S. P. A. Counting individual DNA molecules by the stochastic attachment of diverse labels. Proc. Natl Acad. Sci. USA 108, 9026–9031 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Tange, O. GNU Parallel 2018 (Ole Tange, 2018).

  55. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Butler, A., Hoffman, P., Smibert, P., Papalexi, E. & Satija, R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat. Biotechnol. 36, 411–420 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  57. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300 (1995).

    Google Scholar 

  58. Huang, Y., Sheth, R. U., Kaufman, A. & Wang, H. H. Scalable and cost-effective ribonuclease-based rRNA depletion for transcriptomics. Nucleic Acids Res. 48, e20 (2020).

    Article  PubMed  CAS  Google Scholar 

  59. Armour, C. D. et al. Digital transcriptome profiling using selective hexamer priming for cDNA synthesis. Nat. Methods 6, 647–649 (2009).

    Article  CAS  PubMed  Google Scholar 

  60. He, S. et al. Validation of two ribosomal RNA removal methods for microbial metatranscriptomics. Nat. Methods 7, 807–812 (2010).

    Article  CAS  PubMed  Google Scholar 

  61. Zhulidov, P. A. et al. Simple cDNA normalization using kamchatka crab duplex-specific nuclease. Nucleic Acids Res. 32, e37 (2004).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

Download references

Acknowledgements

We thank the members of the Tavazoie laboratory for discussions and comments on early drafts of the manuscript; and P. Sims for suggestions during the early development of PETRI-seq. S.T. is supported by award no. 5R01AI077562 from the National Institutes of Health. S.B.B. is supported by a National Science Foundation Graduate Research Fellowship (no. DGE 16-44869). W.J. is supported by a fellowship from the Jane Coffin Childs Fund.

Author information

Authors and Affiliations

Authors

Contributions

W.J., S.B.B. and S.T. conceived the study. S.B.B., W.J. and S.T. designed experiments. S.B.B. and W.J. performed experiments and data analysis. P.O. assisted with computational analysis. S.B.B., W.J. and S.T. wrote the paper.

Corresponding author

Correspondence to Saeed Tavazoie.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Experimental and computational pipelines for PETRI-seq.

a–c, Experimental pipeline for PETRI-seq. PETRI-seq libraries can be prepared in just 2.5 days. (a) Detailed schematic of steps for cell preparation, which is started at the end of day 1 and finished on day 2. (b) Detailed schematic of steps for split-pool barcoding, which is entirely done on day 2. (c) Detailed schematic of steps for library preparation, which can be completed (up to sequencing) on day 3 (or later, if preferred). d, Computational pipeline for PETRI-seq analysis after sequencing. e, Structure of contig elements in read 1 after Illumina sequencing of PETRI-seq. To reduce the length of the sequence, barcodes overlap by one base (indicated by asterisk) with the adjacent linker sequence. f, Representative ‘knee plot’ used to select BCs for further analysis. The threshold line at 25,000 BCs is inclusive to facilitate additional filtering after collapsing PCR duplicates to UMIs. g, Representative histogram of reads per UMI. A threshold line was set for each library. For this library, only UMIs with more than 3 reads were kept for downstream analysis. Threshold line at log10(3). h, Species mixing plot with all BCs containing >0 UMIs for library 1.06SaEc. BCs with fewer than 20 UMIs per cell were removed from further analysis. Line segments at x = 20 and y = 20. i, Distribution of E. coli BCs from species mixing plot in h. BCs above the threshold line were used for further analysis and considered single E. coli cells. Threshold line at log2(20). j,k, PCAs of E. coli (orange) and S. aureus (blue) BCs from library 1.06SaEc. For calculation of principal components, rRNA operons were omitted and counts were normalized and scaled as described in methods. In j, all S. aureus and E. coli BCs with greater than 20 total UMIs and greater than 0 mRNAs are included (13,786 S. aureus, 1,153 E. coli). In k, only BCs with greater than or equal to 15 mRNA UMIs are included (6,683 S. aureus, 800 E. coli). For 100% of S. aureus BCs, PC1 < 0.05, and for 100% of E. coli BCs, PC1 > 4.

Source Data

Extended Data Fig. 2 Development and preliminary optimization of PETRI-seq.

a, qPCR after in situ RT with random hexamers shows higher yield of rpsB cDNA from fixation without media (pelleting before) than fixation with media (formaldehyde added to culture) [n = 3 technically independent samples (dots), p = 0.012, 2-sided t-test]. Bars show mean abundance. b, Transcriptome stabilized by RNAprotect after 2-minute spin was highly correlated with transcriptomes stabilized immediately by either RNAprotect or flash freezing. Pearson’s r is reported. c, RNA purified from E. coli cells after 16-hour 4% formaldehyde fixation (‘Fixed Bulk’) was highly correlated with non-fixed RNA (‘Standard Bulk’). 2,617 operons included. Pearson’s r is reported. d, qPCR after in situ RT with rpsB-specific primer (SB10) showed similar yield when cells were resuspended in 50% ethanol (n = 2 technically independent samples). e, qPCR after in situ RT with random hexamers shows improved yield of rpsB cDNA after lysozyme treatment (n = 3 technically independent samples [dots], p = 0.001, 2-sided t-test). Bars show mean abundance. f, qPCR after DNase treatment or incubation with only DNase buffer confirmed in situ DNase treatment efficacy (n = 8 technically independent samples [dots], p = 0.035, 2-sided t-test). Bars show mean abundance. g, qPCR after in situ RT with rpsB-specific primer (SB10) confirmed DNase inactivation, as yield was unchanged (n = 2 technically independent samples [dots]). Bars show mean proportion. h, Gel of 775-bp PCR fragment after 1-hour incubation with DNase-treated cells confirmed DNase inactivation. Right-most lane: DNase was directly added to PCR product. Experiment conducted one time. i, Aggregated PETRI-seq UMIs from DNase-treated and untreated libraries were highly correlated. Pearson’s r reported. j, Bioanalyzer traces of RNA purified after in situ DNase treatment and cell lysis (methods). k, Imaging after E. coli cell preparation. Images for all libraries looked similar (n = 8). l, qPCR after bulk RT and ligation (methods) confirmed effective ligation with a 16-base linker. Minor increase (1.5×) in ligation efficiency was detected (p = 0.001, n = 3 technically independent samples [dots], 2-sided t-test). Bars show mean proportion. m, qPCR after in situ RT showed cDNA retention after AMPure purification (n = 4 technically independent samples, p = 0.69, 2-sided t-test). Bars show mean abundance. n,o, Second-strand synthesis yielded more mRNAs and operons per cell (p < 10−300, 2-sided Mann-Whitney U) than template switching. 10,000 BCs are included from unoptimized PETRI-seq (Experiment 1.08). Boxplots within violins show interquartile range (black box) and median (white circle).

Source Data

Extended Data Fig. 3 Quantification of intercellular contamination using E. coli and S. aureus cells.

After defining single E. coli and S. aureus cells (Fig. 2b, Experiment 1.06SaEc), we examined levels of cross-contamination within single cells. Similar analysis for Experiment 2.01 is shown in Extended Data Fig. 7c, d. a, Quantification of S. aureus-aligned UMIs assigned to E. coli cells after standard PETRI-seq alignment (edit distance ≤1). Reads mapping equally well to both species are discarded. Bottom: Scatterplots of E. coli UMIs vs. absolute (left) or percent (right) S. aureus UMIs assigned to each E. coli cell. Top: Cumulative distributions corresponding to scatterplots. b, Quantification of E. coli-aligned UMIs assigned to S. aureus cells after standard alignment. Bottom: Scatterplots of S. aureus UMIs vs absolute (left) or percent (right) E. coli UMIs assigned to each S. aureus cell. Top: Cumulative distributions corresponding to scatterplots. c, mRNAs per E. coli cell in a. d, mRNAs per S. aureus cell in b. e,f, Same analysis as (a,b) but using more stringent alignment (edit distance = 0) to better understand source of contamination. g, mRNAs per E. coli cell in e. h, mRNAs per S. aureus cell in f. i,j, To further understand the impact of alignment on apparent cross-contamination, we used stringent alignment to map UMIs for a library of only E. coli (Experiment 1.10). Total UMIs (i) or percent of UMIs (j) assigned to S. aureus were determined after stringent alignment for a PETRI-seq library prepared with only E. coli. S. aureus UMIs are computational artifacts. E. coli cells include a mean of 0.02% S. aureus aligned UMIs, indicating that the majority of interspecies contamination observed in e is not caused by incorrect alignment. To quantify contamination, we needed to correct percentages of inter-species alignment based on species abundance in the library (25% of UMIs aligned to E. coli, 75% S. aureus) to predict the percent of UMIs in a given single-cell derived from any other cell (whether or not the same species). We predict a ‘corrected contamination rate’, or percent of UMIs in a single-cell transcriptome derived from another cell, of 0.19-0.36% \(\left( {\frac{{0.14}}{{0.75}} = 0.19;\frac{{0.09}}{{0.25}} = 0.36} \right)\).

Source Data

Extended Data Fig. 4 Further evaluation of PETRI-Seq for E. coli and S. aureus in Experiment 1.06SaEc.

a,b,c, Breakdown of total aligned UMIs (a,b) or reads (c) per cell for PETRI-seq exponential GFP- and RFP-expressing E. coli (a), PETRI-seq exponential S. aureus (b), and bulk exponential wild-type E. coli (c). Left: Stacked bar shows breakdown of sense and anti-sense alignments. Right: Pie shows breakdown of rRNA and mRNA alignments within the sense fraction. d, Distributions of mRNA UMIs (left) and operons (right) per S. aureus cell. 13,785 cells are included. 2 cells were omitted as they contained zero mRNAs. Boxplots within violins show interquartile range (black box) and median (white circle). e, Distributions of mRNA UMIs (left) and operons (right) per E. coli cell in five sub-populations, including GFP cells (contain GFP plasmid transcripts), RFP cells (contain RFP plasmid transcripts), ambiguous cells (contain no plasmid transcripts), and either RFP or GFP and ambiguous cells. Three ambiguous cells classified as E. coli in Fig. 2B were omitted as they contained zero mRNAs. Boxplots within violins show interquartile range (black box) and median (white circle). f, Distribution of total RNAs per GFP-containing exponential E. coli cell. 609 cells are included. g, Left, growth curves for PrplN-GFP, Ptet-RFP, and MG1655 (no plasmid) cells with and without aTc. Right, doubling times calculated from the growth curves. Ptet-RFP had a significantly longer doubling time than all other strains/conditions when induced with aTc (n=4, p=2.2 * 10−5, 2.5 * 10−5, 2.1 * 10−5, 3.6 * 10−5, 2.6 * 10−5 [for each sample moving left to right], 2-sided t-test), which might explain fewer mRNA UMIs in these cells.

Source Data

Extended Data Fig. 5 Further evaluation of growth phase characterization by PETRI-seq.

a, PCA of Experiment 1.06 (biological replicate of 1.10) shows that PETRI-seq can reproducibly distinguish between stationary and exponential cells by projecting cells onto the principal components calculated from the first library (bottom). 2,724 cells are included. 1,551 cells are left of the threshold (PC1=0.34), and 1,173 cells are right of the threshold. mRNA UMIs captured per cell on either side of the threshold line are shown (top). b, PCA as in Fig. 3b, but UMI counts were normalized using sctransform26. c, Expression along PC1 (Fig. 3b, Experiment 1.10) of operons with the most positive or negative PC1 loadings (z-scored moving average, size=1,000 cells). d, Distribution of mRNA UMIs per cell (Experiment 1.10) on either side of the threshold line in Fig. 3b. Grey cells (without plasmid UMIs) are included. Only cells with greater than 14 mRNA UMIs per cell were included, as cells with fewer were excluded from the PCA. 4,878 cells are left of the threshold, and 2,509 cells are right of the threshold. e,f, Breakdown of total aligned UMIs per cell for Experiment 1.10 for cells above and below the PC1 threshold in Fig. 3b. In e, Exponential E. coli (above the threshold) are shown and in f, stationary E. coli (below the threshold) are shown. Left: Stacked bar shows breakdown of sense and anti-sense alignments. Right: Pie shows breakdown of rRNA and mRNA alignments within the sense fraction.

Source Data

Extended Data Fig. 6 Additional optimization of PETRI-seq by increasing ligation primer concentration and adding detergent during barcoding.

a, Increasing the concentration of round 3 ligation primers by 4x relative to previous experiments (1.06SaEc and 1.10) increases mRNA UMIs per cell 2.7-fold for GFP-expressing exponential (green) and RFP-expressing stationary E. coli cells (red). Boxplots within violins show interquartile range (black box) and median (white circle). b, Adding detergent (tween-20) to cells before ligation 1 and after ligation 3 increased mRNA UMIs per cell 1.4-fold relative to original PETRI-seq for wild-type exponential E. coli cells. Boxplots within violins show interquartile range (black box) and median (white circle). c, With 10x more RT primer relative to original PETRI-seq, we observed a shift in the breakdown of sense/anti-sense and mRNA/rRNA UMIs. Left: Stacked bar shows breakdown of sense and anti-sense alignments. Right: Pie shows breakdown of rRNA and mRNA alignments within the sense fraction. Proportions of anti-sense RNAs and sense rRNAs are significantly increased. We hypothesized that any condition effectively increasing the intracellular concentration of RT primers could lead to this undesirable shift. For this reason, detergent was only ever added after RT to avoid further permeabilizing cells and increasing the effective concentration of RT primer. d, Combining detergent treatment and increased ligation primer (for both rounds) resulted in higher mRNA capture for wild-type exponential E. coli cells. Detergent again increased mRNA UMIs per cell (1.5-fold). Boxplots within violins show interquartile range (black box) and median (white circle). e, Optimized PETRI-seq (4x ligation primer, detergent treatment) resulted in S. aureus transcriptomes with a median of 43 mRNA UMIs per cell (left) and 35 operons per cell (right). Boxplots within violins show interquartile range (black box) and median (white circle). f,g, Breakdown of total aligned UMIs per cell for optimized PETRI-seq (Experiment 2.01) for exponential (f) and stationary E. coli (g). Left: Stacked bar shows breakdown of sense and anti-sense alignments. Right: Pie shows breakdown of sense rRNA and mRNA alignments. h,i, Distributions of total UMIs per E. coli (h) and S. aureus (i) BCs in Experiment 2.01. Given higher capture, we imposed higher thresholds for distinguishing cells from background than used previously (Extended Data Fig. 1i). E. coli BCs with more than 128 total UMIs (threshold line in h) and S. aureus BCs with more than 32 total UMIs (threshold line in i) were considered cells.

Source Data

Extended Data Fig. 7 Multiplet frequency and intercellular contamination for optimized PETRI-seq.

a, Species mixing plot for PETRI-seq with 4x ligation primers and no detergent. The multiplet frequency is 0.7%, which is 5-fold higher than the Poisson expectation of 0.14% for 2,423 BCs. b, Species mixing plot for PETRI-seq with 4x ligation primers and detergent (Experiment 2.01). The multiplet frequency is 2.8%, which is 4.7-fold higher than the Poisson expectation of 0.6% for 10,797 BCs. This indicates that compared to no detergent, detergent treatment did not significantly increase multiplet frequency relative to the Poisson expectation. In (a,b), E. coli BCs with > 128 total UMIs and S. aureus BCs with > 32 total UMIs were included. c,d, Quantification of cross-contamination for PETRI-seq with 4x ligation primers and no detergent (c, same experiment as a) or 4x ligation primers and detergent (d, Experiment 2.01 as in b). Scatterplots show the percent of total UMIs for each cell aligned to the incorrect species. Reads were aligned using the stringent alignment (edit distance = 0) described in Extended Data Fig. 3. Top left: Percent of S. aureus UMIs in exponential E. coli cells (based on first round barcode). Top right: Percent of S. aureus UMIs in stationary E. coli cells (based on first round barcode). Bottom left: Percent of E. coli UMIs in S. aureus cells barcoded with exponential E. coli (based on first round barcode). Bottom right: Percent of E. coli UMIs per S. aureus cell barcoded with stationary E. coli (based on first round barcode). As described in Extended Data Fig. 3, we used these inter-species contamination rates to predict a corrected contamination rate (including intra-species contamination). Though higher than the contamination rates observed in the previous species mixing experiment (Extended Data Fig. 3e, f), these rates are comparable to previous findings for eukaryotic scRNA-seq methods23,24 and are not affected by detergent treatment (c vs. d). Furthermore, we anticipate that contamination could be reduced by additional washing prior to cell lysis (see ‘Future directions for optimization’ in Methods).

Source Data

Extended Data Fig. 8 Comparison of plasmid-labeled (Experiment 1.10) and RT-labeled (Experiment 2.01) mixed growth stage libraries reveals minimal cross-contamination between E. coli cells barcoded together.

In Experiment 2.01, exponential and stationary cells were prepared separately and then barcoded independently during RT. In contrast, the RFP-expressing stationary cells and GFP-expressing exponential cells barcoded in Experiment 1.10 were combined for fixation and barcoded together, resulting in more opportunity for cross-contamination. Experiment 2.01 is thus a useful reference to quantify this cross-contamination. To account for differences in the capture efficiency for the two experiments, cells were down-sampled to 30 mRNA UMIs. a, PCA for all 4 cell types reveals that the two stationary populations are biologically distinct, possibly because they were grown independently to slightly different ODs, and RFP cells were induced with aTc. In contrast, the two exponential populations appear very similar. b, PC1 was calculated using only the stationary cells from both experiments. Right: The receiver operating characteristic (ROC) shows that PC1 is a strong classifier of the two states. c, PC1 was calculated using only exponential cells from both experiments. Right: The ROC shows that PC1 is a weak classifier of the two exponential states with performance similar to random assignment (Area Under the ROC Curve [AUC]=0.5). d, PC1 was calculated using wild-type exponential cells from Experiment 2.01, GFP-expressing exponential cells from Experiment 1.10, and RFP-expressing stationary cells from Experiment 1.10 in order to quantify cross-contamination between the GFP and RFP cells using the wild-type exponential cells from Experiment 2.01 as a reference. Right: ROC shows that PC1 is a strong classifier of exponential and stationary cells. The probability that the PC1 value of a wild-type exponential cell is lower than the PC1 value of a stationary RFP cell is 99.9% (AUC = 0.999), while the probability that the PC1 value of a GFP exponential cell is lower than the PC1 value of a stationary RFP cell is 99.67% (AUC = 0.9967). Thus, for the GFP exponential cells, 23 out of 10,000 cell pairs (1 exponential, 1 stationary) will be incorrectly ranked due to cross-contamination in the GFP cells. Finally, we confirmed that in the original library for Experiment 1.10, the relative representation of UMIs from exponential and stationary cells were roughly equal (50.3% stationary, 45.6% exponential), indicating that the cross-contamination analysis for the GFP exponential population would be reciprocal for the RFP stationary population.

Source Data

Extended Data Fig. 9 Defining consensus transcriptional states of sub-populations by aggregating single-cell transcriptomes.

a, Correlation between mRNA abundances from 3,547 aggregated wild-type exponential cells (Experiment 2.01) vs. bulk preparation from fixed exponential wild-type E. coli cells. The Pearson correlation coefficient (r) was calculated for 2,150 out of 2,612 total operons, excluding those with zero counts in either library (grey points), or for all 2,612 operons. Bulk library was prepared from the same cells as the PETRI-seq library. b, Bottom: The correlation between the aggregated mRNA counts of single exponential cells (PETRI-seq) and the bulk exponential library increases as more single cells are included. Correlations were calculated from log10(TPM + 1) for each sample. Top: Difference between top curve and bottom curve in plot below, based on best-fit lines (y = ln(x) + b, r > 0.98). c, Correlation between RNA abundances from 4,627 aggregated wild-type stationary cells (Experiment 2.01) vs. bulk preparation from fixed wild-type stationary E. coli cells. The Pearson correlation coefficient (r) was calculated for 2,050 out of 2,612 total operons, excluding those with zero counts in either library (grey points), or for all 2,612 operons. Bulk library was prepared from the same cells as the PETRI-seq library. d, Bottom: The correlation between the aggregated mRNA counts of single stationary cells (PETRI-seq) and the bulk stationary library increases as more single cells are included. Correlations were calculated from log10(TPM + 1) for each sample. Top: Difference between top curve and bottom curve in plot below, based on best-fit lines (y = ln(x) + b, r > 0.98).

Source Data

Extended Data Fig. 10 PETRI-seq detects rare transcriptional states and candidate genes with highly variable expression.

a, PCA detects rare transcriptional states among 6,663 S. aureus cells. A small sub-population of 28 cells (red) expressed operons from the φSA3usa phage. b, Distribution of PC1 loadings for all operons included in the S. aureus analysis. Eight operons from the φSA3usa phage have the highest PC1 loadings. c, Map of genomic region33 surrounding φSA3usa in the genome of S. aureus strain USA300. Red arrows indicate phage operons upregulated along PC1. d, Percent of mRNA UMIs mapped to the φSA3usa phage for the 28 cells containing phage UMIs. Three cells are composed of >77% phage transcripts. e, Noise (σ2/μ2) versus mean (μ) for operon expression within an S. aureus population of 6,663 cells. 676 operons are included. The circled operon (red) is SAUSA300_1933-1925, which deviated significantly from the rest of the distribution (z-score = 20.6 [determined by residuals from linear regression (see methods)], p = 10−94, FDR < 0.01). f,g, Noise (σ2/μ2) versus mean (μ) for operon expression in either exponential (f) or stationary (g) E. coli populations from Experiment 2.01. 1,960 operons are included in (f) and 1,219 operons in (g). Five operons significantly (FDR < 0.01, z-scores determined by residuals from linear regression [see methods]) deviated from the other operons in (f): sip-dctR (z-score = 7.3, p = 3*10−13), murJ (z-score = 6.7, p = 3*10−11, fimAICDFGH (z-score = 5.4, p = 7*10−8), mdtL (z-score = 4.8, p = 1*10−6), rnhA (z-score = 4.6, p = 4*10−6). fimAICDFGH, which encodes the type I fimbriae system, has been shown previously to exhibit population-level phase variation that is mediated by transcriptional control37. In (e-g), lines at y = -x indicate Poisson noise where σ2 = μ. Operon counts were normalized for each cell before plotting. Operons with fewer than 6 raw total UMIs and a mean less than 0.002 after normalization were excluded.

Source Data

Supplementary information

Supplementary Information

Supplementary Figs. 1 and 2, and Tables 1 and 2.

Reporting Summary

Supplementary Tables 3–5

Supplementary Table 3: 96-well oligonucleotides used for PETRI-seq barcoding. Supplementary Table 4: overview of experiments included in this study. Supplementary Table 5: supplementary statistical data for Fig. 3, Extended Data Fig. 2 and Extended Data Fig. 5.

Supplementary Table 6

Count matrix for experiments 1.06SaEc, 1.10 and 2.01, and Bulk Libraries. Anti-sense operons were excluded. BCs with the prefix SB346 are from experiment 1.06SaEc; 394A from 1.10; and SB442 from 2.01. Bulk libraries for stationary-phase RFP-expressing E. coli cells (SB369) and exponential-phase GFP-expressing E. coli cells (SB371) are also included; reads, rather than UMIs, are reported for bulk libraries. Operon names with the prefix ‘U00096:’ originate from E. coli, whereas operons with the prefix ‘CP000255:’ originate from S. aureus.

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Blattman, S.B., Jiang, W., Oikonomou, P. et al. Prokaryotic single-cell RNA sequencing by in situ combinatorial indexing. Nat Microbiol 5, 1192–1201 (2020). https://doi.org/10.1038/s41564-020-0729-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41564-020-0729-6

  • Springer Nature Limited

This article is cited by

Navigation