Parametric t-Distributed Stochastic Exemplar-Centered Embedding

  • Martin Renqiang MinEmail author
  • Hongyu Guo
  • Dinghan Shen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11051)


Parametric embedding methods such as parametric t-distributed Stochastic Neighbor Embedding (pt-SNE) enables out-of-sample data visualization without further computationally expensive optimization or approximation. However, pt-SNE favors small mini-batches to train a deep neural network but large mini-batches to approximate its cost function involving all pairwise data point comparisons, and thus has difficulty in finding a balance. To resolve the conflicts, we present parametric t-distributed stochastic exemplar-centered embedding. Our strategy learns embedding parameters by comparing training data only with precomputed exemplars to indirectly preserve local neighborhoods, resulting in a cost function with significantly reduced computational and memory complexity. Moreover, we propose a shallow embedding network with high-order feature interactions for data visualization, which is much easier to tune but produces comparable performance in contrast to a deep feedforward neural network employed by pt-SNE. We empirically demonstrate, using several benchmark datasets, that our proposed method significantly outperforms pt-SNE in terms of robustness, visual effects, and quantitative evaluations.



We thank Hans Peter Graf, Farley Lai and Yitong Li for helpful discussions. We thank anonymous reviewers for valuable comments.


  1. 1.
    Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. Proc. VLDB Endow. 5(7), 622–633 (2012)CrossRefGoogle Scholar
  2. 2.
    Bengio, Y., Paiement, J.F., Vincent, P., Delalleau, O., Roux, N.L., Ouimet, M.: Out-of-sample extensions for LLE, isomap, MDS, eigenmaps, and spectral clustering. In: Advances in Neural Information Processing Systems, pp. 177–184 (2004)Google Scholar
  3. 3.
    Burges, C.J.: Dimension reduction: a guided tour, January 2010Google Scholar
  4. 4.
    Carreira-Perpinán, M.A.: The elastic embedding algorithm for dimensionality reduction. In: ICML, vol. 10, pp. 167–174 (2010)Google Scholar
  5. 5.
    Carreira-Perpinán, M.A., Vladymyrov, M.: A fast, universal algorithm to learn parametric nonlinear embeddings. In: Advances in Neural Information Processing Systems, pp. 253–261 (2015)Google Scholar
  6. 6.
    Gisbrecht, A., Schulz, A., Hammer, B.: Parametric nonlinear dimensionality reduction using kernel t-SNE. Neurocomputing 147, 71–82 (2015)CrossRefGoogle Scholar
  7. 7.
    Greengard, L., Rokhlin, V.: A fast algorithm for particle simulations. J. Comput. Phys. 73(2), 325–348 (1987)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Guo, H., Zhu, X., Min, M.R.: A deep learning model for structured outputs with high-order interaction. CoRR abs/1504.08022 (2015)Google Scholar
  9. 9.
    Hinton, G., Roweis, S.: Stochastic neighbor embedding. In: Advances in Neural Information Processing Systems, vol. 15, pp. 833–840 (2003)Google Scholar
  10. 10.
    Kuksa, P.P., Min, M.R., Dugar, R., Gerstein, M.: High-order neural networks and kernel methods for peptide-MHC binding prediction. Bioinformatics 31(22), 3600–3607 (2015)Google Scholar
  11. 11.
    van der Maaten, L.: Learning a parametric embedding by preserving local structure. In: Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, pp. 384–391 (2009)Google Scholar
  12. 12.
    van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)zbMATHGoogle Scholar
  13. 13.
    van der Maaten, L., Postma, E.O., van den Herik, H.J.: Dimensionality reduction: a comparative review (2008)Google Scholar
  14. 14.
    Memisevic, R.: Gradient-based learning of higher-order image features. In: ICCV, pp. 1591–1598 (2011)Google Scholar
  15. 15.
    Min, M.R.: A non-linear dimensionality reduction method for improving nearest neighbour classification. In: Master Thesis. Department of Computer Science, University of Toronto (2005)Google Scholar
  16. 16.
    Min, M.R., Chowdhury, S., Qi, Y., Stewart, A., Ostroff, R.: An integrated approach to blood-based cancer diagnosis and biomarker discovery. In: Pacific Symposium on Biocomputing (PSB), pp. 87–98 (2014)Google Scholar
  17. 17.
    Min, M.R., Guo, H., Song, D.: Exemplar-centered supervised shallow parametric data embedding. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, 19–25 August 2017, pp. 2479–2485 (2017)Google Scholar
  18. 18.
    Min, M.R., van der Maaten, L., Yuan, Z., Bonner, A.J., Zhang, Z.: Deep supervised t-distributed embedding. In: Proceedings of the 27th International Conference on Machine Learning, pp. 791–798 (2010)Google Scholar
  19. 19.
    Min, M.R., Ning, X., Cheng, C., Gerstein, M.: Interpretable sparse high-order Boltzmann machines. In: Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, pp. 614–622 (2014)Google Scholar
  20. 20.
    Purushotham, S., Min, M.R., Kuo, C.C.J., Ostroff, R.: Factorized sparse learning models with interpretable high order feature interactions. In: KDD, New York, USA (2014)Google Scholar
  21. 21.
    Ranzato, M., Hinton, G.E.: Modeling pixel means and covariances using factorized third-order Boltzmann machines. In: CVPR (2010)Google Scholar
  22. 22.
    Ranzato, M., Krizhevsky, A., Hinton, G.E.: Factored 3-way restricted Boltzmann machines for modeling natural images. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010, pp. 621–628 (2010)Google Scholar
  23. 23.
    Van Der Maaten, L.: Barnes-hut-sne. arXiv preprint arXiv:1301.3342 (2013)
  24. 24.
    Van Der Maaten, L.: Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15(1), 3221–3245 (2014)MathSciNetzbMATHGoogle Scholar
  25. 25.
    Vladymyrov, M., Carreira-Perpinan, M.: Partial-hessian strategies for fast learning of nonlinear embeddings. arXiv preprint arXiv:1206.4646 (2012)
  26. 26.
    Vladymyrov, M., Carreira-Perpinan, M.: Linear-time training of nonlinear low-dimensional embeddings. In: Artificial Intelligence and Statistics, pp. 968–977 (2014)Google Scholar
  27. 27.
    Yang, Z., Peltonen, J., Kaski, S.: Scalable optimization of neighbor embedding for visualization. In: International Conference on Machine Learning, pp. 127–135 (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.NEC Labs America PrincetonPrincetonUSA
  2. 2.National Research Council Canada OttawaOttawaCanada
  3. 3.Duke University DurhamDurhamUSA

Personalised recommendations