Skip to main content
Log in

Performance comparison of dimensionality reduction methods on RNA-Seq data from the GTEx project

  • Research Article
  • Published:
Genes & Genomics Aims and scope Submit manuscript

Abstract

Background

One of the apparent characteristics of bioinformatics data is the combination of very large number of features and relatively small number of samples. The vast number of features makes intuitive understanding of a target domain difficult. Dimensionality reduction or manifold learning has potential to circumvent this obstacle, but restricted methods have been preferred.

Objective

The objective of this study is to observe the characteristics of various dimensionality reduction methods—locally linear embedding (LLE), multi-dimensional scaling (MDS), principal component analysis (PCA), spectral embedding (SE), and t-distributed Stochastic Neighbor Embedding (t-SNE)—on the RNA-Seq dataset from the genotype-tissue expression (GTEx) project.

Results

The characteristics of the dimensionality reduction methods are observed on the nine groups of three different tissues in the reduced space with dimensionality of two, three, and four. The visualization results report that each dimensionality reduction method produces a very distinct reduced space. The quantitative results are obtained as the performance of k-means clustering. Clustering in the reduced space from non-linear methods such as LLE, t-SNE and SE achieved better results than in the reduced space produced by linear methods like PCA and MDS.

Conclusions

The experimental results recommend the application of both linear and non-linear dimensionality reduction methods on the target data for grasping the underlying characteristics of the datasets intuitively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15:1373–1396

    Article  Google Scholar 

  • Fowlkes EB, Mallows CL (1983) A method for comparing two hierarchical clusterings. JASA 78:553–569

    Article  Google Scholar 

  • Gisbrecht A, Hammer B, Mokbel B, Sczyrba A (2013) Nonlinear dimensionality reduction for cluster identification in metagenomic samples. Paper presented at 17th international conference on information visualisation, IV13, pp 174–179

  • Hubert L, Arabie P (1985) Comparing partitions. J Classif 2:193–218

    Article  Google Scholar 

  • Konishi T (2015) Principal component analysis for designed experiments. BMC Bioinform 16:S7

    Article  Google Scholar 

  • Kruskal JB (1964) Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. J Psychom 29:1–27

    Article  Google Scholar 

  • Lee G, Rodrigues C, Madabhushi A (2008) Investigatinv the efficacy of nonlinear dimensionality reduction schemes in classifying gene- and protein-expression studies. IEEE/ACM Trans Comput Biol Bioinform 5:368–384

    Article  CAS  Google Scholar 

  • Ma Y, Fu Y (2012) Manifold learing theory and applications. CRC Press, Boca Raton

    Google Scholar 

  • Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605

    Google Scholar 

  • Pearson K (1901) On lines and planes of closest fit to systems of points in space. Philos Mag 2:559–572

    Article  Google Scholar 

  • Rosenberg A, Hirschberg J (2007) V-measure: a conditional entropy-based external cluster evaluation measure. EMNLP-CoNLL 2007:410–420

    Google Scholar 

  • Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65

    Article  Google Scholar 

  • Roweis ST, Saul LK (2000) Nonliner dimensionality reduction by locally linear embedding. Science 290:2323–2326

    Article  CAS  Google Scholar 

  • The GTEx Consortium (2015) The genotype-tissue expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348:648–660

    Article  Google Scholar 

  • The GTEx Consortium (2017) Genetic effects on gene expression across human tissues. Nature 550:204–213

    Article  Google Scholar 

  • Yang J, Wang H, Ding H, An N, Alterovitz G (2017) Nonlinear dimensionality reduction methods for synthetic biology biobricks’ visulaization. BMC Bioinform 18:47

    Article  Google Scholar 

  • Yeung KY, Ruzzo WL (2001) Principal component analysis for clustering gene expression data. Bioinformatics 17:763–774

    Article  CAS  Google Scholar 

  • Zhou X, Mao J, Ai J, Deng Y, Roth MR, Pound C, Henegar J, Welti R, Bigler SA (2012) Identification of plasma lipid biomarkers for prostate cancer by lipidomics and bioinformatics. PLoS One 7:e48889

    Article  CAS  Google Scholar 

Download references

Acknowledgements

This study was supported by 2017 Research Grant from Kangwon National University and Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2016R1D1A1B03931615 and 2018R1D1A1B07047156).

Author information

Authors and Affiliations

Authors

Contributions

H-SS conceived the study, carried out data analysis, and drafted the manuscript.

Corresponding author

Correspondence to Ho-Sik Seok.

Ethics declarations

Conflict of interest

The author declares that he has no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 2225 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Seok, HS. Performance comparison of dimensionality reduction methods on RNA-Seq data from the GTEx project. Genes Genom 42, 225–234 (2020). https://doi.org/10.1007/s13258-019-00896-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13258-019-00896-6

Keywords

Navigation