Skip to main content
Log in

CVGAE: A Self-Supervised Generative Method for Gene Regulatory Network Inference Using Single-Cell RNA Sequencing Data

  • Original research article
  • Published:
Interdisciplinary Sciences: Computational Life Sciences Aims and scope Submit manuscript

Abstract

Gene regulatory network (GRN) inference based on single-cell RNA sequencing data (scRNAseq) plays a crucial role in understanding the regulatory mechanisms between genes. Various computational methods have been employed for GRN inference, but their performance in terms of network accuracy and model generalization is not satisfactory, and their poor performance is caused by high-dimensional data and network sparsity. In this paper, we propose a self-supervised method for gene regulatory network inference using single-cell RNA sequencing data (CVGAE). CVGAE uses graph neural network for inductive representation learning, which merges gene expression data and observed topology into a low-dimensional vector space. The well-trained vectors will be used to calculate mathematical distance of each gene, and further predict interactions between genes. In overall framework, FastICA is implemented to relief computational complexity caused by high dimensional data, and CVGAE adopts multi-stacked GraphSAGE layers as an encoder and an improved decoder to overcome network sparsity. CVGAE is evaluated on several single cell datasets containing four related ground-truth networks, and the result shows that CVGAE achieve better performance than comparative methods. To validate learning and generalization capabilities, CVGAE is applied in few-shot environment by change the ratio of train set and test set. In condition of few-shot, CVGAE obtains comparable or superior performance.

Graphical Abstract

CVGAE utilizes a beta variational autoencoder framework in conjunction with graph neural networks to characterize the underlying gene regulatory networks in single-cell gene expression data. The model employs multiple stacked SAGE layers to produce embedding representations of domain nodes, ensuring that the vector representation adheres to a multivariate Gaussian distribution. CVGAE leverages further convolutional computation and multi-layer perceptrons to determine the strength of interactions between nodes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Haghverdi L, Büttner M, Wolf FA et al (2016) Diffusion pseudotime robustly reconstructs lineage branching. Nat Methods 13(10):845–848. https://doi.org/10.1038/nmeth.3971

    Article  CAS  PubMed  Google Scholar 

  2. Muzio G, O’Bray L, Borgwardt K (2021) Biological network analysis with deep learning. Brief Bioinform 22(2):1515–1530. https://doi.org/10.1093/bib/bbaa257

    Article  PubMed  Google Scholar 

  3. Liu W, Sun X, Yang L et al (2022) NSCGRN: a network structure control method for gene regulatory network inference. Brief Bioinform 23(5):bbac156. https://doi.org/10.1093/bib/bbac156

    Article  CAS  PubMed  Google Scholar 

  4. Liu W, Yang Y, Lu X et al (2023) NSRGRN: a network structure refinement method for gene regulatory network inference. Brief Bioinform 24(3):bbad129. https://doi.org/10.1093/bib/bbad129

    Article  CAS  PubMed  Google Scholar 

  5. Nguyen H, Tran D, Tran B et al (2021) A comprehensive survey of regulatory network inference methods using single cell RNA sequencing data. Brief Bioinform 22(3):bbaa190. https://doi.org/10.1093/bib/bbaa190

    Article  PubMed  PubMed Central  Google Scholar 

  6. Liu W, Jiang Y, Peng L et al (2022) Inferring gene regulatory networks using the improved markov blanket discovery algorithm. Interdiscip Sci 14(1):168–181. https://doi.org/10.1007/s12539-021-00478-9

    Article  PubMed  Google Scholar 

  7. Woodhouse S, Piterman N, Wintersteiger CM et al (2018) SCNS: a graphical tool for reconstructing executable regulatory networks from single-cell genomic data. BMC Syst Biol 12(1):59–59. https://doi.org/10.1186/s12918-018-0581-y

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Matsumoto H, Kiryu H, Furusawa C et al (2017) SCODE: an efficient regulatory network inference algorithm from single-cell RNA-Seq during differentiation. Bioinformatics 33(15):2314–2321. https://doi.org/10.1093/bioinformatics/btx194

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Matsumoto H, Kiryu H (2016) SCOUP: a probabilistic model based on the Ornstein-Uhlenbeck process to analyze single-cell expression data during differentiation. BMC Bioinform 17(1):232–232. https://doi.org/10.1186/s12859-016-1109-3

    Article  CAS  Google Scholar 

  10. Liu H, Li P, Zhu M et al (2016) Nonlinear network reconstruction from gene expression data using marginal dependencies measured by DCOL. PLoS ONE 11(7):e0158247–e0158247. https://doi.org/10.1371/journal.pone.0158247

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Aibar S, González-Blas CB, Moerman T et al (2017) SCENIC: single-cell regulatory network inference and clustering. Nat Methods 14(11):1083–1086. https://doi.org/10.1038/nmeth.4463

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Huynh-Thu VA, Irrthum A, Wehenkel L et al (2010) Inferring regulatory networks from expression data using tree-based methods. PLoS ONE 5(9):e12776–e12776. https://doi.org/10.1371/journal.pone.0012776

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Shu H, Zhou J, Lian Q et al (2021) Modeling gene regulatory networks using neural network architectures. Nat Comput Sci 1(7):491–501. https://doi.org/10.1038/s43588-021-00099-8

    Article  PubMed  Google Scholar 

  14. Yuan Y, Bar-Joseph Z (2021) Deep learning of gene relationships from single cell time-course expression data. Brief Bioinform 22(5):bbab142. https://doi.org/10.1093/bib/bbab142

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Kc K, Li R, Cui F et al (2019) GNE: a deep learning framework for gene network inference by aggregating biological information. BMC Syst Biol 13(2):1–14. https://doi.org/10.1093/bioinformatics/btac559

    Article  CAS  Google Scholar 

  16. Yue X, Wang Z, Huang J et al (2020) Graph embedding on biomedical networks: methods, applications and evaluations. Bioinformatics 36(4):1241–1251. https://doi.org/10.1093/bioinformatics/btz718

    Article  CAS  PubMed  Google Scholar 

  17. Zhao M, He W, Tang J et al (2022) A hybrid deep learning framework for gene regulatory network inference from single-cell transcriptomic data. Brief Bioinform 23(2):b568–b568. https://doi.org/10.1093/bib/bbab568

    Article  CAS  Google Scholar 

  18. Chen G, Liu Z-P (2022) Graph attention network for link prediction of gene regulations from single-cell RNA-sequencing data. Bioinformatics 38(19):4522–4529. https://doi.org/10.1093/bioinformatics/btac559

    Article  CAS  PubMed  Google Scholar 

  19. Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4):411–430. https://doi.org/10.1016/s0893-6080(00)00026-5

    Article  PubMed  Google Scholar 

  20. Kingma DP, Welling M (2013) Auto-encoding variational bayes. arXiv. http://arxiv.org/abs/1312.6114

  21. Zhang Z, Xu J, Wu Y et al (2022) CapsNet-LDA: predicting lncRNA-disease associations using attention mechanism and capsule network based on multi-view data. Brief Bioinform 24(1):bbac531. https://doi.org/10.1093/bib/bbac531

    Article  CAS  Google Scholar 

  22. Sun F, Sun J, Zhao Q (2022) A deep learning method for predicting metabolite-disease associations via graph neural network. Brief Bioinform 23(4):bbac266. https://doi.org/10.1093/bib/bbac266

    Article  CAS  PubMed  Google Scholar 

  23. Liu X, Song C, Huang F et al (2022) GraphCDR: a graph neural network method with contrastive learning for cancer drug response prediction. Brief Bioinform 23(1):bbab457. https://doi.org/10.1093/bib/bbab457

    Article  PubMed  PubMed Central  Google Scholar 

  24. Wang H, Huang F, Xiong Z et al (2022) A heterogeneous network-based method with attentive meta-path extraction for predicting drug-target interactions. Brief Bioinform 23(4):bbac184. https://doi.org/10.1093/bib/bbac184

    Article  PubMed  Google Scholar 

  25. Fu H, Huang F, Liu X et al (2021) MVGCN: data integration through multi-view graph convolutional network for predicting links in biomedical bipartite networks. Bioinformatics 38(2):426–434. https://doi.org/10.1093/bioinformatics/btab651

    Article  CAS  Google Scholar 

  26. Wang W, Zhang L, Sun J et al (2022) Predicting the potential human lncRNA-miRNA interactions based on graph convolution network with conditional random field. Brief Bioinform 23(6):bbac463. https://doi.org/10.1093/bib/bbac463

    Article  CAS  PubMed  Google Scholar 

  27. Yi H-C, You Z-H, Huang D-S et al (2022) Graph representation learning in bioinformatics: trends, methods and applications. Brief Bioinform 23(1):bbab340–bbab340. https://doi.org/10.1093/bib/bbab340

    Article  PubMed  Google Scholar 

  28. Xu K, Hu W, Leskovec J et al (2018) How Powerful are Graph Neural Networks? arXiv. https://doi.org/10.48550/arXiv.1810.00826

  29. Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. arXiv. https://doi.org/10.48550/arXiv.1706.02216

  30. Peng L, Tan J, Xiong W et al (2023) Deciphering ligand-receptor-mediated intercellular communication based on ensemble deep learning and the joint scoring strategy from single-cell transcriptomic data. Comput Biol Med 163:107137. https://doi.org/10.1016/j.compbiomed.2023.107137

    Article  CAS  PubMed  Google Scholar 

  31. Peng L, Wang F, Wang Z et al (2022) Cell-cell communication inference and analysis in the tumour microenvironments from single-cell transcriptomics: data resources and computational strategies. Brief Bioinform 23(4):bbac234. https://doi.org/10.1093/bib/bbac234

    Article  CAS  PubMed  Google Scholar 

  32. Peng L, Xiong W, Han C et al (2023) Cell Dialog: a computational framework for ligand-receptor-mediated cell-cell communication analysis III. IEEE J Biomed Health Inform 28(1):580–591. https://doi.org/10.1109/jbhi.2023.3333828

    Article  Google Scholar 

  33. Peng L, Tan J, Tian X et al (2022) EnANNDeep: An ensemble-based lncRNA–protein interaction prediction framework with adaptive k-nearest neighbor classifier and deep models. Interdiscip Sci: Comput Life Sci 14(1):209–232. https://doi.org/10.1007/s12539-021-00483-y

    Article  CAS  Google Scholar 

  34. Peng L, Yuan R, Han C et al (2023) CellEnBoost: a boosting-based ligand-receptor interaction identification model for cell-to-cell communication inference. IEEE Trans Nanobiosci 22(4):705–715. https://doi.org/10.1109/TNB.2023.3278685

    Article  CAS  Google Scholar 

  35. Shen L, Liu F, Huang L et al (2022) VDA-RWLRLS: An anti-SARS-CoV-2 drug prioritizing framework combining an unbalanced bi-random walk and Laplacian regularized least squares. Comput Biol Med 140:105119. https://doi.org/10.1016/j.compbiomed.2021.105119

    Article  CAS  PubMed  Google Scholar 

  36. Chu L-F, Leng N, Zhang J et al (2016) Single-cell RNA-seq reveals novel regulators of human embryonic stem cell differentiation to definitive endoderm. Genome Biol 17(1):173. https://doi.org/10.1186/s13059-016-1033-x

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Camp JG, Sekine K, Gerber T et al (2017) Multilineage communication regulates human liver bud development from pluripotency. Nature 546(7659):533–538. https://doi.org/10.1038/nature22796

    Article  CAS  PubMed  Google Scholar 

  38. Shalek AK, Satija R, Shuga J et al (2014) Single-cell RNA-seq reveals dynamic paracrine control of cellular variation. Nature 510(7505):363–369. https://doi.org/10.1038/nature13437

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Hayashi T, Ozaki H, Sasagawa Y et al (2018) Single-cell full-length total RNA sequencing uncovers dynamics of recursive splicing and enhancer RNAs. Nat Commun 9(1):619. https://doi.org/10.1038/s41467-018-02866-0

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Nestorowa S, Hamey FK, Pijuan Sala B et al (2016) A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood 128(8):e20–e31. https://doi.org/10.1182/blood-2016-05-716480

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Szklarczyk D, Gable AL, Lyon D et al (2019) STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets. Nucleic Acids Res 47(D1):D607–D613. https://doi.org/10.1093/nar/gky1131

    Article  CAS  PubMed  Google Scholar 

  42. Garcia-Alonso L, Holland CH, Ibrahim MM et al (2019) Benchmark and integration of resources for the estimation of human transcription factor activities. Genome Res 29(8):1363–1375. https://doi.org/10.1101/gr.240663.118

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Liu Z-P, Wu C, Miao H et al (2015) RegNetwork: an integrated database of transcriptional and post-transcriptional regulatory networks in human and mouse. Database 2015(2015):bav095. https://doi.org/10.1093/database/bav095

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Han H, Cho J-W, Lee S et al (2018) TRRUST v2: an expanded reference database of human and mouse transcriptional regulatory interactions. Nucleic Acids Res 46(D1):D380–D386. https://doi.org/10.1093/nar/gkx1013

    Article  CAS  PubMed  Google Scholar 

  45. Oki S, Ohta T, Shioi G et al (2018) Ch IP-Atlas: a data-mining suite powered by full integration of public Ch IP-seq data. EMBO Rep 19(12):e46255. https://doi.org/10.15252/embr.201846255

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Xu H, Baroukh C, Dannenfelser R et al (2013) ESCAPE: database for integrating high-content published data collected from human and mouse embryonic stem cells. Database 2013(2013):bat045. https://doi.org/10.1093/database/bat045

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Moore JE, Purcaro MJ, Pratt HE et al (2020) Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583(7818):699–710. https://doi.org/10.1038/s41586-020-2493-4

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Mora-Bermúdez F, Badsha F, Kanton S et al (2016) Differences and similarities between human and chimpanzee neural progenitors during cerebral cortex development. Elife 5:e18683. https://doi.org/10.7554/eLife.18683

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Peng L, Yang C, Chen Y et al (2023) Predicting CircRNA-Disease associations via feature convolution learning with heterogeneous graph attention network. IEEE J Biomed Health Inform 27(6):3072-3082. https://doi.org/10.1109/jbhi.2023.3260863

    Article  PubMed  Google Scholar 

  50. Zhou Z, Zhuo L, Fu X et al (2024) Joint deep autoencoder and subgraph augmentation for inferring microbial responses to drugs. Brief Bioinform 25(1):bbad483. https://doi.org/10.1093/bib/bbad483

    Article  PubMed Central  Google Scholar 

  51. Zhou Z, Zhuo L, Fu X et al (2023) Joint masking and self-supervised strategies for inferring small molecule-miRNA associations. Mol Ther - Nucleic Acids 35:102103. https://doi.org/10.1016/j.omtn.2023.102103

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Liu W, Tang T, Lu X et al (2023) MPCLCDA: predicting circRNA–disease associations by using automatically selected meta-path and contrastive learning. Brief Bioinform 24(4):bbad227. https://doi.org/10.1093/bib/bbad227

    Article  CAS  PubMed  Google Scholar 

  53. Liu W, Lin H, Huang L et al (2022) Identification of miRNA–disease associations via deep forest ensemble learning based on autoencoder. Brief Bioinform 23(3):bbac104. https://doi.org/10.1093/bib/bbac104

    Article  CAS  PubMed  Google Scholar 

  54. Junlin X, Jielin X, Yajie M et al (2023) Graph embedding and Gaussian mixture variational autoencoder network for end-to-end analysis of single-cell RNA sequencing data. Cell Rep Methods 3(1):100382. https://doi.org/10.1016/j.crmeth.2022.100382

    Article  CAS  Google Scholar 

  55. Tang J, Qu M, Wang M et al (2015) Line: Large-scale information network embedding. In:Proceedings of the 24th international conference on world wide web. https://doi.org/10.1145/2736277.2741093

  56. Higgins I, Matthey L, Pal A et al. beta-vae: Learning basic visual concepts with a constrained variational framework. In: International conference on learning representations. https://openreview.net/forum?id=Sy2fzU9gl

Download references

Funding

The Scientific Research Fund of Hunan Provincial Education Department, 22A0101, Wei Liu

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Wei Liu or Jing Chen.

Ethics declarations

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 2560 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, W., Teng, Z., Li, Z. et al. CVGAE: A Self-Supervised Generative Method for Gene Regulatory Network Inference Using Single-Cell RNA Sequencing Data. Interdiscip Sci Comput Life Sci (2024). https://doi.org/10.1007/s12539-024-00633-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12539-024-00633-y

Keywords

Navigation