Skip to main content

Visualizing High-Dimensional Data Using t-Distributed Stochastic Neighbor Embedding Algorithm

  • Chapter
  • First Online:
Principles of Data Science

Abstract

Data visualization is a powerful tool and widely adopted by organizations for its effectiveness to abstract the right information, understand, and interpret results clearly and easily. The real challenge in any data science exploration is to visualize it. Visualizing a discrete, categorical data attribute using bar plots, pie charts are a few of the effective ways for data exploration. Most of the datasets have a large number of features. In other words, data is distributed across a high number of dimensions. Visually exploring such high-dimensional data can then become challenging and even practically impossible to do manually. Hence it is essential to understand how to visualize high-dimensional datasets. t-Distributed stochastic neighbor embedding (t-SNE) is a technique for dimensionality reduction and explicitly applicable to the visualization of high-dimensional datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bellman, R. E. (1961). Adaptive control processes: A guided tour (p. 197). Princeton: Princeton University Press.

    Book  Google Scholar 

  2. Negrel, R., Picard, D., & Gosselin, P. -H. (2014). Dimensionality reduction of visual features using sparse projectors for content-based image retrieval. In IEEE International Conference on Image Processing, Paris, France.

    Google Scholar 

  3. Pavel Pudil, J. N. (1998). Novel methods for feature subset selection with respect to problem knowledge. In H. M. Huan Liu (Ed.), Feature extraction, construction and selection: A data mining perspective (The Springer international series in engineering and computer science) (Vol. 453, pp. 101–116). New York: Springer.

    Chapter  Google Scholar 

  4. Bae, S.-H., Qiu, J., & Fox, G. (2012). High performance multidimensional scaling for large high-dimensional data visualization. In IEEE transaction of parallel and distributed system.

    Google Scholar 

  5. Ingram, S., Munzner, T., & Olano, M. (2009). Glimmer: Multilevel MDS on the GPU. IEEE Transactions on Visualization and Computer Graphics, 15(2), 249–261.

    Article  Google Scholar 

  6. Tenenbaum, J. B., De Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.

    Article  Google Scholar 

  7. Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323–2326.

    Article  Google Scholar 

  8. Wang, W., Huang, Y., Wang, Y., & Wang, L. (2014). Generalized autoencoder: A neural network framework. In IEEE conference on computer vision and pattern recognition workshops.

    Google Scholar 

  9. Soni, J., Prabakar, N., & Upadhyay, H. (2019, December 05–07). Behavioral analyses of system call sequences using LSTM Seq-Seq, cosine similarity and Jaccard similarity for real-time anomaly detection. In Proceedings of the 2019 International Conference on Computational Science and Computational Intelligence (CSCI’19), Las Vegas, NV.

    Google Scholar 

  10. Soni, J., Prabakar, N., & Upadhyay, H. (2019, May 15–17). Deep learning approach to detect malicious attacks at system level. In WiSec’19: Proceedings of 12th ACM Conference on Security & Privacy in Wireless and Mobile Networks, Miami, FL, USA.

    Google Scholar 

  11. Wang, Y., Yao, H., & Zhao, S. (2016). Auto-encoder based dimensionality reduction. Neurocomputing, 184(C), 232–242.

    Article  Google Scholar 

  12. Soni, J., Prabakar, N., & Upadhyay, H. (2019). Feature extraction through deepwalk on weighted graph. In Proceedings of the 15th International Conference on Data Science (ICDATA’19), Las Vegas, NV.

    Google Scholar 

  13. Rioul, O., & Vetterli, M. (1991). Wavelets and signal processing. IEEE Signal Processing Magazine, 8, 14–38.

    Article  Google Scholar 

  14. Jolliffe, I. (2002). Principal component analysis (2nd ed.). New York: Springer-Verlag.

    MATH  Google Scholar 

  15. Reid, S., (2014, October). Dimensionality reduction techniques. Turing Finance [Online]. Available http://www.turingfinance.com/artificialintelligence-and-statistics-principal-component-analysis-and-self-organizing-maps/

  16. Kwon, H., Fan, J., & Kharchenko, P (2017), Comparison of principal component analysis and t-Stochastic neighbor embedding with distance metric modifications for single-cell RNA-sequencing data analysis, bioRxiv.

    Google Scholar 

  17. Yi, J., Mao, X., Xue, Y., & Compare, A. (2013). Facial expression recognition based on t-SNE and AdaboostM2. In 2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing, Beijing.

    Google Scholar 

  18. Van der Maaten, L., & Hinton, G. (2008). Visualizing Data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.

    MATH  Google Scholar 

  19. Soni, J., Prabakar, N., & Kim, J.-H. (2017). Prediction of component failures of telepresence robot with temporal data. In 30th Florida Conference on Recent Advances in Robotics.

    Google Scholar 

  20. Soni, J., & Prabakar, N. (2018). Effective machine learning approach to detect groups of fake reviewers. In Proceedings of the 14th International Conference on Data Science (ICDATA’18), Las Vegas, NV.

    Google Scholar 

  21. Abdelmoula, Balluff, B., Englert, S., Dijkstra, J., Reinders, M. J. T., Walch, A., McDonnell, L. A., & Lelieveldt, B. P. F. (2016). Data-driven identification of prognostic tumor subpopulations using spatially mapped t-sne of mass spectrometry imaging data. Proceedings of the National Academy of Sciences, 113(43), 12244–12249.

    Google Scholar 

  22. Heuer, H. (2015). Text comparison using word vector representations and dimensionality reduction. In Proceedings of the EuroSciPy, pp. 13–16.

    Google Scholar 

  23. https://distill.pub/2016/misread-tsne/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jayesh Soni .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Soni, J., Prabakar, N., Upadhyay, H. (2020). Visualizing High-Dimensional Data Using t-Distributed Stochastic Neighbor Embedding Algorithm. In: Arabnia, H.R., Daimi, K., Stahlbock, R., Soviany, C., Heilig, L., Brüssau, K. (eds) Principles of Data Science. Transactions on Computational Science and Computational Intelligence. Springer, Cham. https://doi.org/10.1007/978-3-030-43981-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-43981-1_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-43980-4

  • Online ISBN: 978-3-030-43981-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics