Enhancing Understandability of Omics Data with SHAP, Embedding Projections and Interactive Visualisations

Qu, Zhonglin; Tegegne, Yezihalem; Simoff, Simeon J.; Kennedy, Paul J.; Catchpoole, Daniel R.; Nguyen, Quang Vinh

doi:10.1007/978-981-19-8746-5_5

Zhonglin Qu ORCID: orcid.org/0000-0003-4500-004X¹³,
Yezihalem Tegegne¹³,
Simeon J. Simoff¹⁴,
Paul J. Kennedy¹⁵,
Daniel R. Catchpoole^16,17,18 &
…
Quang Vinh Nguyen¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1741))

Included in the following conference series:

Australasian Conference on Data Mining

477 Accesses
1 Citations

Abstract

Uniform Manifold Approximation and Projection (UMAP) is a new and effective non-linear dimensionality reduction (DR) method recently applied in biomedical informatics analysis. UMAP’s data transformation process is complicated and lacks transparency. Principal component analysis (PCA) is a conventional and essential DR method for analysing single-cell datasets. PCA projection is linear and easy to interpret. The UMAP is more scalable and accurate, but the complex algorithm makes it challenging to endorse the users’ trust. Another challenge is that some single-cell data have too many dimensions, making the computational process inefficient and lacking accuracy. This paper uses linkable and interactive visualisations to understand UMAP results by comparing PCA results. An explainable machine learning model, SHapley Additive exPlanations (SHAP) run on Random Forest (RF), is used to optimise the input single-cell data to make UMAP and PCA processes more efficient. We demonstrate that this approach can be applied to high-dimensional omics data exploration to visually validate informative molecule markers and cell populations identified from the UMAP-reduced dimensionality space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Focused multidimensional scaling: interactive visualization for exploration of high-dimensional data

Article Open access 02 May 2019

Dimensionality reduction for visualizing single-cell data using UMAP

Article 03 December 2018

Making Visualization Work for You: Deriving Valuable Insights from Omics Data

Notes

1.
https://www.tableau.com/.

References

Wong, K.-C.: Big data challenges in genome informatics. Biophys. Rev. 11, 51–54 (2018)
Article Google Scholar
Pierson, E., Yau, C.: ZIFA: dimensionality reduction for zero-inflated single-cell gene expression analysis. Genome Biol. 16(1), 241 (2015)
Article Google Scholar
Yang, Y., et al.: SAFE-clustering: single-cell Aggregated (from Ensemble) clustering for single-cell RNA-seq data. Bioinformatics 35(8), 1269–1277 (2019)
Article Google Scholar
Hosoya, H., Hyvärinen, A.: Learning visual spatial pooling by strong PCA dimension reduction. Neural Comput. 28(7), 1249 (2016)
Article MathSciNet MATH Google Scholar
Sumithra, V.S., Subu, S.: A review of various linear and non linear dimensionality reduction techniques. Int. J. Comput. Sci. Inf. Technol. 6(3), 2354–2360 (2015)
Google Scholar
Nguyen, L.H., Holmes, S.: Ten quick tips for effective dimensionality reduction. PLoS Comput. Biol. 15(6), e1006907 (2019)
Article Google Scholar
Konstorum, A., et al.: Comparative analysis of linear and nonlinear dimension reduction techniques on mass cytometry data. bioRxiv, p. 273862 (2018)
Google Scholar
Etienne, B., et al.: Dimensionality reduction for visualizing single-cell data using UMAP. Nat. Biotechnol. 37(1), 38–44 (2018)
Google Scholar
Trozzi, F., Wang, X., Tao, P.: UMAP as a dimensionality reduction tool for molecular dynamics simulations of biomacromolecules: a comparison study. J. Phys. Chem. B 125(19), 5022–5034 (2021)
Article Google Scholar
Szabo, P.A., et al.: Single-cell transcriptomics of human T cells reveals tissue and activation signatures in health and disease. Nat .Commun. 10(1), 4706–4716 (2019)
Article Google Scholar
Tegegne, Y., Qu, Z., Qian, Y., Nguyen, Q.V.: Parallel nonlinear dimensionality reduction using GPU Acceleration. In: Xu, Y., et al. (eds.) AusDM 2021. CCIS, vol. 1504, pp. 3–15. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-8531-6_1
Chapter Google Scholar
Wang, Y., et al.: Understanding how dimension reduction tools work: an empirical approach to deciphering t-SNE, UMAP, TriMAP, and PaCMAP for data visualization (2020)
Google Scholar
Nauta, M., et al.: From anecdotal evidence to quantitative evaluation methods: a systematic review on evaluating explainable AI. arXiv preprint arXiv:2201.08164 (2022)
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?”: explaining the predictions of any classifier. In: International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144. ACM (2016)
Google Scholar
Lundberg, S.M., Lee, S.-I.: A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 4768–4777. Curran Associates Inc., Long Beach (2017)
Google Scholar
Osborne, M.J.: A Course in Game Theory. In: Rubinstein, A. (ed.) MIT Press, Cambridge (2006)
Google Scholar
Shapley, L.S., Kuhn, H., Tucker, A.: Contributions to the theory of games. Ann. Math. Stud. 28(2), 307–317 (1953)
Google Scholar
Watson, D.: Interpretable machine learning for genomics (2021)
Google Scholar
Fernando, Z.T., Singh, J., Anand, A.: A study on the interpretability of neural retrieval models using DeepSHAP. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval (2019)
Google Scholar
Vilone, G., Longo, L.: Explainable artificial intelligence: a systematic review. arXiv preprint arXiv:2006.00093 (2020)
Strobelt, H., et al.: Lstmvis: a tool for visual analysis of hidden state dynamics in recurrent neural networks. IEEE Trans. Visual Comput. Graphics 24(1), 667–676 (2017)
Article Google Scholar
Thelisson, E.: Towards trust, transparency and liability in AI/AS systems. In: IJCAI (2017)
Google Scholar
Dimitriadis, S., Liparas, D.: How random is the random forest? Random forest algorithm on the service of structural imaging biomarkers for Alzheimer’s disease: from Alzheimer’s disease neuroimaging initiative (ADNI) database. Neural Regen. Res. 13(6), 962–970 (2018)
Article Google Scholar
Python (2020). https://www.python.org/
Candela, M.G.J.B.G., et al.: NIST form-based handprint recognition system. Technical Report NISTIR 5469, Nat'l Inst. of Standards and Technology 91994)
Google Scholar
Tableau (2020). https://www.tableau.com/
BioLegend: Comprehensive solutions for single-cell and bulk multiomics (2021). https://www.biolegend.com/en-us/totalseq?gclid=CjwKCAjwx8iIBhBwEiwA2quaq0V-IkCRsY9UZ6G1Lop5Tfd0dl1m_YF-_fyd-1Hgz5fUvpEvevRpcRoCIjUQAvD_BwE. Accessed 22 Aug 2021
Stoeckius, M., et al.: Simultaneous epitope and transcriptome measurement in single cells. Nat. Methods 14(9), 865–868 (2017)
Article Google Scholar
Radoš, S., et al.: Towards quantitative visual analytics with structured brushing and linked statistics. Comput. Graph. Forum 35(3), 251–260 (2016)
Article Google Scholar

Download references

Acknowledgement

We appreciate Yu “Max” Qian of J. Craig Venter Institute and the BioLgend Company for providing the TotalSeq dataset used in the paper. All datasets used in the study are fully de-identified and do not contain any protected health information.

Author information

Authors and Affiliations

School of Computer, Data and Mathematical Sciences, Western Sydney University, Penrith, Australia
Zhonglin Qu & Yezihalem Tegegne
MARCS Institute and School of Computer, Data and Mathematical Sciences, Western Sydney University, Sydney, Australia
Simeon J. Simoff & Quang Vinh Nguyen
Australian Artificial Intelligence Institute, Faculty of Engineering and Information Technology Sydney, Sydney, NSW, Australia
Paul J. Kennedy
The Tumour Bank, Children’s Cancer Research Unit, Kids Research, The Children’s Hospital at Westmead, Westmead, Australia
Daniel R. Catchpoole
The Discipline of Paediatrics and Child Health, The Faculty of Medicine, The University of Sydney, Camperdown, Australia
Daniel R. Catchpoole
Faculty of Information Technology, The University of Technology Sydney, Sydney, Australia
Daniel R. Catchpoole

Authors

Zhonglin Qu
View author publications
You can also search for this author in PubMed Google Scholar
Yezihalem Tegegne
View author publications
You can also search for this author in PubMed Google Scholar
Simeon J. Simoff
View author publications
You can also search for this author in PubMed Google Scholar
Paul J. Kennedy
View author publications
You can also search for this author in PubMed Google Scholar
Daniel R. Catchpoole
View author publications
You can also search for this author in PubMed Google Scholar
Quang Vinh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhonglin Qu .

Editor information

Editors and Affiliations

Western Sydney University, Sydney, NSW, Australia
Laurence A. F. Park
Victoria University of Wellington, Wellington, New Zealand
Heitor Murilo Gomes
Auckland University of Technology, Auckland, New Zealand
Maryam Doborjeh
RMIT University, Melbourne, VIC, Australia
Yee Ling Boo
University of Auckland, Auckland, New Zealand
Yun Sing Koh
CSIRO Scientific Computing, Canberra, ACT, Australia
Yanchang Zhao
Australian National University, Canberra, ACT, Australia
Graham Williams
Western Sydney University, Sydney, NSW, Australia
Simeon Simoff

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qu, Z., Tegegne, Y., Simoff, S.J., Kennedy, P.J., Catchpoole, D.R., Nguyen, Q.V. (2022). Enhancing Understandability of Omics Data with SHAP, Embedding Projections and Interactive Visualisations. In: Park, L.A.F., et al. Data Mining. AusDM 2022. Communications in Computer and Information Science, vol 1741. Springer, Singapore. https://doi.org/10.1007/978-981-19-8746-5_5

Download citation

DOI: https://doi.org/10.1007/978-981-19-8746-5_5
Published: 05 December 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-8745-8
Online ISBN: 978-981-19-8746-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Enhancing Understandability of Omics Data with SHAP, Embedding Projections and Interactive Visualisations

Abstract

Access this chapter

Similar content being viewed by others

Focused multidimensional scaling: interactive visualization for exploration of high-dimensional data

Dimensionality reduction for visualizing single-cell data using UMAP

Making Visualization Work for You: Deriving Valuable Insights from Omics Data

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Enhancing Understandability of Omics Data with SHAP, Embedding Projections and Interactive Visualisations

Abstract

Access this chapter

Similar content being viewed by others

Focused multidimensional scaling: interactive visualization for exploration of high-dimensional data

Dimensionality reduction for visualizing single-cell data using UMAP

Making Visualization Work for You: Deriving Valuable Insights from Omics Data

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation