Skip to main content
Log in

cKBET: assessing goodness of batch effect correction for single-cell RNA-seq

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Single-cell RNA sequencing reveals the gene structure and gene expression status of a single cell, which can reflect the heterogeneity between cells. However, batch effects caused by non-biological factors may hinder data integration and downstream analysis. Although the batch effect can be evaluated by visualizing the data, which actually is subjective and inaccurate. In this work, we propose a quantitative method cKBET, which considers the batch and cell type information simultaneously. The cKBET method accesses batch effects by comparing the global and local fraction of cells of different batches in different cell types. We verify the performance of our cKBET method on simulated and real biological data sets. The experimental results show that our cKBET method is superior to existing methods in most cases. In general, our cKBET method can detect batch effect with either balanced or unbalanced cell types, and thus evaluate batch correction methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Hashimshony T, Wagner F, Sher N, Yanai I. CEL-seq: single-cell RNA-seq by multiplexed linear amplification. Cell Reports, 2012, 2(3): 666–673

    Article  Google Scholar 

  2. Picelli S, Björklund Å K, Faridani O R, Sagasser S, Winberg G, Sandberg R. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nature Methods, 2013, 10(11): 1096–1098

    Article  Google Scholar 

  3. Macosko E Z, Basu A, Satija R, Nemesh J, Shekhar K, Goldman M, Tirosh I, Bialas A R, Kamitaki N, Martersteck E M, Trombetta J J, Weitz D A, Sanes J R, Shalek A K, Regev A, McCarroll S A. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell, 2015, 161(5): 1202–1214

    Article  Google Scholar 

  4. Klein A M, Mazutis L, Akartuna I, Tallapragada N, Veres A, Li V, Peshkin L, Weitz D A, Kirschner M W. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell, 2015, 161(5): 1187–1201

    Article  Google Scholar 

  5. Cao J, Packer J S, Ramani V, Cusanovich D A, Huynh C, Daza R, Qiu X, Lee C, Furlan S N, Steemers F J, Adey A, Waterston R H, Trapnell C, Shendure J. Comprehensive single-cell transcriptional profiling of a multicellular organism. Science, 2017, 357(6352): 661–667

    Article  Google Scholar 

  6. Zheng G X Y, Terry J M, Belgrader P, Ryvkin P, Bent Z W, Wilson R, Ziraldo S B, Wheeler T D, McDermott G P, Zhu J, Gregory M T, Shuga J, Montesclaros L, Underwood J G, Masquelier D A, Nishimura S Y, Schnall-Levin M, Wyatt P W, Hindson C M, Bharadwaj R, Wong A, Ness K D, Beppu L W, Deeg H J, McFarland C, Loeb K R, Valente W J, Ericson N G, Stevens E A, Radich J P, Mikkelsen T S, Hindson B J, Bielas J H. Massively parallel digital transcriptional profiling of single cells. Nature Communications, 2017, 8: 14049

    Article  Google Scholar 

  7. Zhang X, Marjani S L, Hu Z, Weissman S M, Pan X, Wu S. Single-cell sequencing for precise cancer research: progress and prospects. Cancer Research, 2016, 76(6): 1305–1312

    Article  Google Scholar 

  8. Chen H, Ye F, Guo G. Revolutionizing immunology with single-cell RNA sequencing. Cellular & Molecular Immunology, 2019, 16(3): 242–249

    Article  Google Scholar 

  9. Hicks S C, Townes F W, Teng M, Irizarry R A. Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics, 2018, 19(4): 562–578

    Article  MathSciNet  Google Scholar 

  10. Tung P Y, Blischak J D, Hsiao C J, Knowles D A, Burnett J E, Pritchard J K, Gilad Y. Batch effects and the effective design of single-cell gene expression studies. Scientific Reports, 2017, 7: 39921

    Article  Google Scholar 

  11. Johnson W E, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics, 2007, 8(1): 118–127

    Article  MATH  Google Scholar 

  12. Ritchie M E, Phipson B, Wu D, Hu Y, Law C W, Shi W, Smyth G K. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research, 2015, 43(7): e47

    Article  Google Scholar 

  13. Risso D, Ngai J, Speed T P, Dudoit S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nature Biotechnology, 2014, 32(9): 896–902

    Article  Google Scholar 

  14. Leek J T. Svaseq: removing batch effects and other unwanted noise from sequencing data. Nucleic Acids Research, 2014, 42(21): e161

    Article  Google Scholar 

  15. Haghverdi L, Lun A T L, Morgan M D, Marioni J C. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nature Biotechnology, 2018, 36(5): 421–427

    Article  Google Scholar 

  16. Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, Baglaenko Y, Brenner M, Loh P R, Raychaudhuri S. Fast, sensitive and accurate integration of single-cell data with harmony. Nature Methods, 2019, 16(12): 1289–1296

    Article  Google Scholar 

  17. Aliverti E, Tilson J L, Filer D L, Babcock B, Colaneri A, Ocasio J, Gershon T R, Wilhelmsen K C, Dunson D B. Projected t-SNE for batch correction. Bioinformatics, 2020, 36(11): 3522–3527

    Article  Google Scholar 

  18. Li X, Wang K, Lyu Y, Pan H, Zhang J, Stambolian D, Susztak K, Reilly M P, Hu G, Li M. Deep learning enables accurate clustering with batch effect removal in single-cell RNA-seq analysis. Nature Communications, 2020, 11(1): 2338

    Article  Google Scholar 

  19. Wang T, Johnson T S, Shao W, Lu Z, Helm B R, Zhang J, Huang K. BERMUDA: a novel deep transfer learning method for single-cell RNA sequencing batch correction reveals hidden high-resolution cellular subtypes. Genome Biology, 2019, 20(1): 165

    Article  Google Scholar 

  20. Shaham U, Stanton K P, Zhao J, Li H, Raddassi K, Montgomery R, Kluger Y. Removal of batch effects using distribution-matching residual networks. Bioinformatics, 2017, 33(16): 2539–2546

    Article  Google Scholar 

  21. Büttner M, Miao Z, Wolf F A, Teichmann S A, Theis F J. A test metric for assessing single-cell RNA-seq batch correction. Nature Methods, 2019, 16(1): 43–49

    Article  Google Scholar 

  22. Pearson K. LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 1901, 2(11): 559–572

    Article  MATH  Google Scholar 

  23. Van der Maaten L, Hinton G. Visualizing data using t-SNE. Journal of Machine Learning Research, 2008, 9(86): 2579–2605

    MATH  Google Scholar 

  24. Rousseeuw P J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 1987, 20: 53–65

    Article  MATH  Google Scholar 

  25. Massy W F. Principal components regression in exploratory statistical research. Journal of the American Statistical Association, 1965, 60(309): 234–256

    Article  Google Scholar 

  26. McCarthy D J, Campbell K R, Lun A T L, Wills Q F. Scater: preprocessing, quality control, normalization and visualization of single-cell RNA-seq data in R. Bioinformatics, 2017, 33(8): 1179–1186

    Article  Google Scholar 

  27. Kolodziejczyk A A, Kim J K, Tsang J C H, Ilicic T, Henriksson J, Natarajan K N, Tuck A C, Gao X, Bühler M, Liu P, Marioni J C, Teichmann S A. Single cell RNA-sequencing of pluripotent states unlocks modular transcriptional variation. Cell Stem Cell, 2015, 17(4): 471–485

    Article  Google Scholar 

  28. The Tabula Muris Consortium. Single-cell transcriptomics of 20 mouse organs creates a Tabula muris. Nature, 2018, 562(7727): 367–372

    Article  Google Scholar 

Download references

Acknowledgements

The work was supported by the NSFC projects (Grant No. 11631012).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Limin Li.

Additional information

Yameng Zhao obtained her Bachelor degree from Hunan University, China in 2020. She is currently a doctoral candidate at School of Mathematics and Statistics in Xi’an Jiaotong University, China. Her research interest is the applications in bioinformatics.

Yin Guo obtained her Bachelor degree from Minzu University of China, China in 2018. She is currently a doctoral candidate at School of Mathematics and Statistics in Xi’an Jiaotong University, China. Her research interest is the applications in bioinformatics.

Limin Li obtained her Bachelor and Master degrees from Zhejiang University, China in 2004 and 2006, respectively. She got her PhD degree in mathematics at the University of Hong Kong, China in 2010. She then worked as a postdoctoral fellow in Max Planck Institute of Intelligent System. She is currently a professor at School of Mathematics and Statistics in Xi’an Jiaotong University, China. Her research interests include machine learning and the applications in bioinformatics.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, Y., Guo, Y. & Li, L. cKBET: assessing goodness of batch effect correction for single-cell RNA-seq. Front. Comput. Sci. 18, 181901 (2024). https://doi.org/10.1007/s11704-022-2111-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-022-2111-8

Keywords

Navigation