Visualizing non-metric similarities in multiple maps

Abstract

Techniques for multidimensional scaling visualize objects as points in a low-dimensional metric map. As a result, the visualizations are subject to the fundamental limitations of metric spaces. These limitations prevent multidimensional scaling from faithfully representing non-metric similarity data such as word associations or event co-occurrences. In particular, multidimensional scaling cannot faithfully represent intransitive pairwise similarities in a visualization, and it cannot faithfully visualize “central” objects. In this paper, we present an extension of a recently proposed multidimensional scaling technique called t-SNE. The extension aims to address the problems of traditional multidimensional scaling techniques when these techniques are used to visualize non-metric similarities. The new technique, called multiple maps t-SNE, alleviates these problems by constructing a collection of maps that reveal complementary structure in the similarity data. We apply multiple maps t-SNE to a large data set of word association data and to a data set of NIPS co-authorships, demonstrating its ability to successfully visualize non-metric similarities.

References

  1. Banerjee, A., Krumpelman, C., Basu, S., Mooney, R., & Ghosh, J. (2005). Model based overlapping clustering. In Proceedings of the ACM SIGKDD conference on knowledge discovery and data mining.

    Google Scholar 

  2. Belkin, M., & Niyogi, P. (2002). Laplacian Eigenmaps and spectral techniques for embedding and clustering. In Advances in neural information processing systems (Vol. 14, pp. 585–591).

    Google Scholar 

  3. Belongie, S., Malik, J., & Puzicha, J. (2001). Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(4), 509–522.

    Article  Google Scholar 

  4. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.

    MATH  Google Scholar 

  5. Blei, D. M., Griffiths, T. L., Jordan, M. I., & Tenenbaum, J. B. (2004). Hierarchical topic models and the nested Chinese restaurant process. In S. Thrun, L. Saul, & B. Schölkopf (Eds.), Advances in neural information processing systems (Vol. 16, pp. 17–24). Cambridge: The MIT Press.

    Google Scholar 

  6. Borg, I., & Groenen, P. J. F. (2005). Modern multidimensional scaling (2nd ed.). New York: Springer.

    Google Scholar 

  7. Bostock, M., Ogievetsky, V., & Heer, J. (2011). D3: Data-driven documents. IEEE Transactions on Visualization and Computer Graphics, 17(12), 2301–2309.

    Article  Google Scholar 

  8. Breitkreutz, B.-J., Stark, C., & Tyers, M. (2003). Osprey: a network visualization system. Genome Biology, 4(3), R22.1–R22.4.

    Google Scholar 

  9. Carreira-Perpiñán, M. Á. (2010). The elastic embedding algorithm for dimensionality reduction. In Proceedings of the 27th international conference on machine learning (pp. 167–174).

    Google Scholar 

  10. Cayton, L., & Dasgupta, S. (2006). Robust Euclidean embedding. In Proceedings of the 23rd international conference on machine learning (pp. 169–176).

    Google Scholar 

  11. Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of the international conference on machine learning (pp. 160–167).

    Google Scholar 

  12. Cook, J. A., Sutskever, I., Mnih, A., & Hinton, G. E. (2007). Visualizing similarity data with a mixture of maps. JMLR Workshop and Conference Proceedings, 2, 67–74.

    Google Scholar 

  13. Erhan, D., Bengio, Y., Courville, A., Manzagol, P.-A., Vincent, P., & Bengio, S. (2010) Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 11, 625–660.

    MathSciNet  Google Scholar 

  14. Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315, 972–976.

    MathSciNet  MATH  Article  Google Scholar 

  15. Gashi, I., Stankovic, V., Leita, C., & Thonnard, O. (2009). An experimental study of diversity with off-the-shelf antivirus engines. In Proceedings of the IEEE international symposium on network computing and applications (pp. 4–11).

    Google Scholar 

  16. Globerson, A., & Roweis, S. (2007). Visualizing pairwise similarity via semidefinite programming. In Proceedings of the 11th international workshop on artificial intelligence and statistics (AI-STATS) (pp. 139–146).

    Google Scholar 

  17. Globerson, A., Chechik, G., Pereira, F., & Tishby, N. (2007). Euclidean embedding of co-occurrence data. Journal of Machine Learning Research, 8, 2265–2295.

    MathSciNet  MATH  Google Scholar 

  18. Griffiths, T. L., Steyvers, M., & Tenenbaum, J. L. (2007). Topics in semantic representation. Psychological Review, 114(2), 211–244.

    Article  Google Scholar 

  19. Heller, K. A., & Ghahramani, Z. (2007). A nonparametric Bayesian approach to modeling overlapping clusters. In Proceedings of the 11th international conference on artificial intelligence and statistics.

    Google Scholar 

  20. Hinton, G. E., & Roweis, S. T. (2003). Stochastic neighbor embedding. In Advances in neural information processing systems (Vol. 15, pp. 833–840).

    Google Scholar 

  21. Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22th annual international SIGIR conference (pp. 50–57). New York: ACM Press.

    Google Scholar 

  22. Jacobs, R. A. (1988). Increased rates of convergence through learning rate adaptation. Neural Networks, 1, 295–307.

    Article  Google Scholar 

  23. Jäkel, F., Schölkopf, B., & Wichmann, F. A. (2008). Similarity, kernels, and the triangle inequality. Journal of Mathematical Psychology, 52(2), 297–303.

    MathSciNet  MATH  Article  Google Scholar 

  24. Jamieson, A. R., Giger, M. L., Drukker, K., Li, H., Yuan, Y., & Bhooshan, N. (2010). Exploring nonlinear feature space dimension reduction and data representation in breast CADx with Laplacian Eigenmaps and t-SNE. Medical Physics, 37(1), 339–351.

    Article  Google Scholar 

  25. Keim, D. A., Kohlhammer, J., Ellis, G., & Mansmann, F. (2010). Mastering the information age; solving problems with visual analytics. Eurographics Association.

  26. Klimt, B., & Yang, Y. (2004). Lecture notes in computer science: Vol. 3201. The Enron corpus: a new dataset for email classification research (pp. 217–226).

    Google Scholar 

  27. Kruskal, J. B., & Wish, M. (1986). Multidimensional scaling. Beverly Hills: Sage.

    Google Scholar 

  28. Lacoste-Julien, S., Sha, F., & Jordan, M. I. (2009). DiscLDA: Discriminative learning for dimensionality reduction and classification. In Advances in neural information processing systems (Vol. 21, pp. 897–904).

    Google Scholar 

  29. Lafon, S., & Lee, A. B. (2006). Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9), 1393–1403.

    Article  Google Scholar 

  30. Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240.

    Article  Google Scholar 

  31. Laub, J., & Müller, K.-R. (2004). Feature discovery in non-metric pairwise data. Journal of Machine Learning Research, 5, 801–818.

    MATH  Google Scholar 

  32. Laub, J., Macke, J., Müller, K.-R., & Wichmann, F. A. (2007). Inducing metric violations in human similarity judgements. In Advances in neural information processing systems (Vol. 19, pp. 777–784).

    Google Scholar 

  33. Lawrence, N. D. (2011). Spectral dimensionality reduction via maximum entropy. In Proceedings of the international conference on artificial intelligence and statistics (pp. 51–59).

    Google Scholar 

  34. Lund, K., Burgess, C., & Atchley, R. A. (1995). Semantic and associative priming in high-dimensional semantic space. In Proceedings of the 17th annual conference of the cognitive science society (pp. 660–665). Mahwah: Erlbaum.

    Google Scholar 

  35. Mao, Y., Balasubramanian, K., & Lebanon, G. (2010). Dimensionality reduction for text using domain knowledge. In Proceedings of the 23rd international conference on computational linguistics (pp. 801–809).

    Google Scholar 

  36. McCallum, A. (1999). Multi-label text classification with a mixture model trained by em. In AAAI workshop on text learning. New York: ACM.

    Google Scholar 

  37. McCallum, A., Corrada-Emmanuel, A., & Wang, X. (2004). The author-recipient-topic model for topic and role discovery in social networks: experiments with Enron and academic email (Technical Report UM-CS-2004-096). Department of Computer Science, University of Massachusetts, Amherst, MA.

  38. Mnih, A., & Hinton, G. E. (2009). A scalable hierarchical distributed language model. In Advances in neural information processing systems (pp. 1081–1088).

    Google Scholar 

  39. Nelson, D. L., McEvoy, C. L., & Schreiber, T. A. (1998). The University of South Florida word association, rhyme, and word fragment norms.

  40. Pekalska, E., & Duin, R. P. W. (2005). The dissimilarity representation for pattern recognition: foundations and applications. Singapore: World Scientific.

    Google Scholar 

  41. Plaisant, C. (2004). The challenge of information visualization evaluation. In Proceedings of the working conference on advanced visual interfaces.

    Google Scholar 

  42. Rosen-Zvi, M., Griffiths, T., Steyversand, M., & Smyth, P. (2004). The author-topic model for authors and documents. In Proceedings of the 20th conference on uncertainty in artificial intelligence. Arlington: AUAI Press.

    Google Scholar 

  43. Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by Locally Linear Embedding. Science, 290(5500), 2323–2326.

    Article  Google Scholar 

  44. Sammon, J. W. (1969). A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, 18(5), 401–409.

    Article  Google Scholar 

  45. Schmidtlein, S., Zimmermann, P., Schüpferling, R., & Weiss, C. (2007). Mapping the floristic continuum: ordination space position estimated from imaging spectroscopy. Journal of Vegetation Science, 18, 131–140.

    Article  Google Scholar 

  46. Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge: MIT Press.

    Google Scholar 

  47. Schölkopf, B., Smola, A. J., & Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), 1299–1319.

    Article  Google Scholar 

  48. Shaw, B., & Jebara, T. (2009). Structure preserving embedding. In Proceedings of the international conference on machine learning (pp. 937–944).

    Google Scholar 

  49. Steyvers, M., & Tenenbaum, J. B. (2005). The large-scale structure of semantic networks: statistical analyses and a model of semantic growth. Cognitive Science, 29(1), 41–78.

    Article  Google Scholar 

  50. Teh, Y., Jordan, M. I., Beal, M., & Blei, D. M. (2004). Hierarchical Dirichlet processes. In Advances in neural information processing systems (Vol. 17, pp. 1385–1392). Cambridge: MIT Press.

    Google Scholar 

  51. Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.

    Article  Google Scholar 

  52. Thomas, J. J., & Cook, K. A. (2005). Illuminating the path: the research and development agenda for visual analytics.

  53. Thonnard, O., Mees, W., & Dacier, M. (2009). Addressing the attack attribution problem using knowledge discovery and multi-criteria fuzzy decision-making. In Proceedings of the ACM SIGKDD workshop on CyberSecurity and intelligence informatics (pp. 11–21).

    Google Scholar 

  54. Torgerson, W. S. (1952). Multidimensional scaling I: theory and method. Psychometrika, 17, 401–419.

    MathSciNet  MATH  Article  Google Scholar 

  55. Tversky, A., & Hutchinson, J. W. (1986). Nearest neighbor analysis of psychological spaces. Psychological Review, 93(11), 3–22.

    Article  Google Scholar 

  56. van der Maaten, L. J. P. (2009). Learning a parametric embedding by preserving local structure. In Proceedings of the twelfth international conference on artificial intelligence and statistics (AI-STATS), JMLR W&CP (Vol. 5, pp. 384–391).

    Google Scholar 

  57. van der Maaten, L. J. P., & Hinton, G. E. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2431–2456.

    Google Scholar 

  58. van der Maaten, L. J. P., & Postma, E. O. (2010). Texton-based analysis of paintings. In SPIE optical engineering and applications (Vol. 7798-16).

    Google Scholar 

  59. Venna, J., Peltonen, J., Nybo, K., Aidos, H., & Kaski, S. (2010). Information retrieval perspective to nonlinear dimensionality reduction for data visualization. Journal of Machine Learning Research, 11, 451–490.

    MathSciNet  Google Scholar 

  60. Villmann, T., & Haase, S. (2010). Mathematical foundations of the generalization of t-SNE and SNE for arbitrary divergences (Technical Report 02/2010). University of Applied Sciences Mittweida.

  61. von Luxburg, U. (2010). Clustering stability: an overview. Foundations and Trends in Machine Learning, 2(3), 235–274.

    Google Scholar 

  62. Weinberger, K. Q., Packer, B. D., & Saul, L. K. (2005). Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization. In Proceedings of the 10th international workshop on AI and statistics. Barbados: Society for Artificial Intelligence and Statistics.

    Google Scholar 

  63. Yang, Z., King, I., Oja, E., & Xu, Z. (2010). Heavy-tailed symmetric stochastic neighbor embedding. In Advances in neural information processing systems (Vol. 22). Cambridge: MIT Press.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Laurens van der Maaten.

Additional information

Editor: Paolo Frasconi.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and Permissions

About this article

Cite this article

van der Maaten, L., Hinton, G. Visualizing non-metric similarities in multiple maps. Mach Learn 87, 33–55 (2012). https://doi.org/10.1007/s10994-011-5273-4

Download citation

Keywords

  • Multidimensional scaling
  • Embedding
  • Data visualization
  • Non-metric similarities