Disentangled Representations of Cellular Identity

  • Ziheng WangEmail author
  • Grace H. T. Yeo
  • Richard Sherwood
  • David GiffordEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11467)


We introduce a disentangled representation for cellular identity that constructs a latent cellular state from a linear combination of condition specific basis vectors that are then decoded into gene expression levels. The basis vectors are learned with a deep autoencoder model from single-cell RNA-seq data. Linear arithmetic in the disentangled representation successfully predicts nonlinear gene expression interactions between biological pathways in unobserved treatment conditions. We are able to recover the mean gene expression profiles of unobserved conditions with an average Pearson r = 0.73, which outperforms two linear baselines, one with an average r = 0.43 and another with an average r = 0.19. Disentangled representations hold the promise to provide new explanatory power for the interaction of biological pathways and the prediction of effects of unobserved conditions for applications such as combinatorial therapy and cellular reprogramming. Our work is motivated by recent advances in deep generative models that have enabled synthesis of images and natural language with desired properties from interpolation in a “latent representation” of the data.


Single-cell RNA seq Gene expression Generative modeling Deep learning 



We acknowledge the members of the Gifford and Sherwood labs for helpful discussion.


  1. 1.
    Al-Lazikani, B., Banerji, U., Workman, P.: Combinatorial drug therapy for cancer in the post-genomic era. Nat. Biotechnol. 30(7), 679 (2012)CrossRefGoogle Scholar
  2. 2.
    Ghahramani, A., Watt, F.M., Luscombe, N.M.: Generative adversarial networks simulate gene expression and predict perturbations in single cells. bioArXiv preprint (2018).
  3. 3.
    Bojanowski, P., Joulin, A., Lopez-Paz, D., Szlam, A.: Optimizing the latent space of generative networks. arXiv preprint arXiv:1707.05776 (2017)
  4. 4.
    Ding, J., Condon, A., Shah, S.P.: Interpretable dimensionality reduction of single cell transcriptome data with deep generative models. Nat. Commun. 9(1), 2002 (2018)CrossRefGoogle Scholar
  5. 5.
    Eguchi, A., et al.: Reprogramming cell fate with a genome-scale library of artificial transcription factors. Proc. National Acad. Sci. 113(51), E8257–E8266 (2016)CrossRefGoogle Scholar
  6. 6.
    Ferdous, M.M., Bao, Y., Vinciotti, V., Liu, X., Wilson, P.: Predicting gene expression from genome wide protein binding profiles. Neurocomputing 275, 1490–1499 (2018)CrossRefGoogle Scholar
  7. 7.
    Gómez-Bombarelli, R., et al.: Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent. Sci. 4(2), 268–276 (2018)CrossRefGoogle Scholar
  8. 8.
    Yeo, G.H.T., Lin, L., Qi, Y.C., Gifford, D.K., Sherwood, R.I.: Elucidation of combinatorial signaling logic with multiplexed barcodelet single-cell RNA-seq (2018, in prep)Google Scholar
  9. 9.
    Jaitin, D.A., et al.: Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science 343(6172), 776–779 (2014)CrossRefGoogle Scholar
  10. 10.
    Kendall, A., Gal, Y.: What uncertainties do we need in Bayesian deep learning for computer vision? In: Advances in Neural Information Processing Systems, pp. 5574–5584 (2017)Google Scholar
  11. 11.
    Kingma, D.P., Mohamed, S., Rezende, D.J., Welling, M.: Semi-supervised learning with deep generative models. In: Advances in Neural Information Processing Systems, pp. 3581–3589 (2014)Google Scholar
  12. 12.
    Li, H., Xu, Z., Taylor, G., Goldstein, T.: Visualizing the loss landscape of neural nets. arXiv preprint arXiv:1712.09913 (2017)
  13. 13.
    Lopez, R., Regier, J., Cole, M., Jordan, M., Yosef, N.: A deep generative model for gene expression profiles from single-cell RNA sequencing. arXiv preprint arXiv:1709.02082 (2017)
  14. 14.
    Lun, A.T., Bach, K., Marioni, J.C.: Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 17(1), 75 (2016)CrossRefGoogle Scholar
  15. 15.
    Macarron, R., et al.: Impact of high-throughput screening in biomedical research. Nat. Rev. Drug Discov. 10(3), 188 (2011)CrossRefGoogle Scholar
  16. 16.
    Mohammadi, S., Ravindra, V., Gleich, D.F., Grama, A.: A geometric approach to characterize the functional identity of single cells. Nat. Commun. 9(1), 1516 (2018)CrossRefGoogle Scholar
  17. 17.
    Okawa, S., et al.: Transcriptional synergy as an emergent property defining cell subpopulation identity enables population shift. Nat. Commun. 9(1), 2595 (2018)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Patel, A.P., et al.: Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science 344(6190), 1396–1401 (2014)CrossRefGoogle Scholar
  19. 19.
    Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
  20. 20.
    Salvatier, J., Wiecki, T.V., Fonnesbeck, C.: Probabilistic programming in python using PyMC3. PeerJ Comput. Sci. 2, e55 (2016). Scholar
  21. 21.
    Satija, R., Farrell, J.A., Gennert, D., Schier, A.F., Regev, A.: Spatial reconstruction of single-cell gene expression data. Nat. Biotechnol. 33(5), 495 (2015)CrossRefGoogle Scholar
  22. 22.
    Singh, R., Lanchantin, J., Robins, G., Qi, Y.: DeepChrome: deep-learning for predicting gene expression from histone modifications. Bioinformatics 32(17), i639–i648 (2016)CrossRefGoogle Scholar
  23. 23.
    Takahashi, K., et al.: Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell 131(5), 861–872 (2007)CrossRefGoogle Scholar
  24. 24.
    Wagner, A., Regev, A., Yosef, N.: Revealing the vectors of cellular identity with single-cell genomics. Nat. Biotechnol. 34(11), 1145 (2016)CrossRefGoogle Scholar
  25. 25.
    Wang, X., Ghasedi Dizaji, K., Huang, H.: Conditional generative adversarial network for gene expression inference. Bioinformatics 34(17), i603–i611 (2018)CrossRefGoogle Scholar
  26. 26.
    White, T.: Sampling generative networks. arXiv preprint arXiv:1609.04468 (2016)
  27. 27.
    Xie, R., Wen, J., Quitadamo, A., Cheng, J., Shi, X.: A deep auto-encoder model for gene expression prediction. BMC Genomics 18(9), 845 (2017)CrossRefGoogle Scholar
  28. 28.
    Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. Roy. Stat. Soc. Ser. B (Stat. Methodol.) 67(2), 301–320 (2005)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Electrical Engineering and Computer ScienceM.I.TCambridgeUSA
  2. 2.Division of GeneticsBrigham and Women’s HospitalBostonUSA
  3. 3.Department of MedicineHarvard Medical SchoolBostonUSA
  4. 4.Hubrecht InstituteUtrechtThe Netherlands
  5. 5.Department of Biological EngineeringM.I.TCambridgeUSA

Personalised recommendations