Skip to main content
Log in

Enhancing performance of gene expression value prediction with cluster-based regression

  • Research Article
  • Published:
Genes & Genomics Aims and scope Submit manuscript

Abstract

Background

The inherent correlations among gene expressions have received attention. Recently, it was reported that a set of approximately 1000 landmark genes can be utilized for prediction of expression of other genes (target genes).

Objective

The objective of this study is to predict expression values of target genes based on expression values of landmark genes.

Methods

A cluster-based regression method is proposed. In the proposed method, clusters are obtained from a set of training instances of a gene and an estimator is obtained per cluster. A test instance is assigned to one of clusters then a regression model corresponding to the cluster predicts expression value.

Results

Performance of the proposed method is measured on the GEO (Gene Expression Omnibus) expression data and the GTEx (Genotype-Tissue Expression) expression data. In terms of mean absolute error averaged across target genes, the proposed method significantly outperforms previous approaches in the case of the GEO expression data.

Conclusions

The experimental results report that the combination of clustering and regression can outperform the state-of-the art methods such as generative adversarial networks and a gradient boosting based method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

  • Bageritz J, Willnow P, Valentini E, Leible S, Boutros M, Teleman A (2019) Gene expression atlas of a developing tissue by single cell expression correlation analysis. Nat Methods 16:750–756

    Article  CAS  Google Scholar 

  • Bishop CM (2006) Linear basis function models. Pattern Recognition and Machine Learning. Springer, New York, pp 138–147

    Google Scholar 

  • Chen Y (2014) Machine learning for large-scale genomics: algorithms, models and applications. Ph.D. dissertation, Dept. Comp. Sci., UC Irvine

  • Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. Proc. the 22nd ACM SIGKDD, pp. 785—794

  • Chen Y, Li Y, Narayan R, Subramanian A, Xie X (2016) Gene expression inference with deep learning. Bioinformatics 32:1832–1839

    Article  CAS  Google Scholar 

  • Dizaji KG, Wang X, Huang H (2018) Semi-supervised generative adversarial network for gene expression inference. Proc. the 24th ACM SIGKDD, pp. 1435–1444

  • Edgar R, Domrachev M, Lash AE (2008) Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res 30:207–210

    Article  Google Scholar 

  • Goodfellow I, Bengio Y, Courville A (2016) Generative adversarial networks. Deep Learning. The MIT Press, Cambridge, pp 690–693

    Google Scholar 

  • Greene C et al (2015) Understanding multicellular function and disease with human tissue-specific networks. Nat Genet 47:569–576

    Article  CAS  Google Scholar 

  • Kouw WM, Loog M (2019) An introduction to domain adaptation and transfer learning. https://arxiv.org/abs/1812.11806

  • Lamb J et al (2006) The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313:1929–1935

    Article  CAS  Google Scholar 

  • Lappalainen T, Sammeth M, Dermitzakis ET (2013) Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501:506–511

    Article  CAS  Google Scholar 

  • Li W, Yin Y, Quan X, Zhang H (2019) Gene expression value prediction based on XGBoost algorithm. Front Genet 10:1077

    Article  CAS  Google Scholar 

  • Lonsdale J et al (2013) The genotype-tissue expression (GTEx) project. Nat Genet 45:580–585

    Article  CAS  Google Scholar 

  • Murphy KP (2012a) Kernel ridge regression. Machine learning: a probabilistic perspective. The MIT Press, Cambridge, pp 492–493

    Google Scholar 

  • Murphy KP (2012b) Boosting as functional gradient descent. Machine learning: a probabilistic perspective. The MIT Press, Cambridge, pp 560–561

    Google Scholar 

  • Pedregosa F et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830

    Google Scholar 

  • Subramanian A et al (2017) A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 171:1437–1452

    Article  CAS  Google Scholar 

  • van Dam S, Võsa U, van der Graaf A, Franke L, de Magalhães JP (2018) Gene co-expression analysis for functional classification and genedisease predictions. Brief Bioinformatics 19:575–592

    PubMed  Google Scholar 

  • Wang X, Dizaji KG, Huang H (2018) Conditional generative adversarial network for gene expression inference. Bioinformatics 34:i603–i611

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2018R1D1A1B07047156).

Author information

Authors and Affiliations

Authors

Contributions

H.S.S conceived the study, implemented the algorithm, analyzed the experimental result, and drafted the manuscript.

Corresponding author

Correspondence to Ho-Sik Seok.

Ethics declarations

Competing interests

The author declares that he has no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Seok, HS. Enhancing performance of gene expression value prediction with cluster-based regression. Genes Genom 43, 1059–1064 (2021). https://doi.org/10.1007/s13258-021-01128-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13258-021-01128-6

Keywords