Abstract
Techniques for multidimensional scaling visualize objects as points in a low-dimensional metric map. As a result, the visualizations are subject to the fundamental limitations of metric spaces. These limitations prevent multidimensional scaling from faithfully representing non-metric similarity data such as word associations or event co-occurrences. In particular, multidimensional scaling cannot faithfully represent intransitive pairwise similarities in a visualization, and it cannot faithfully visualize “central” objects. In this paper, we present an extension of a recently proposed multidimensional scaling technique called t-SNE. The extension aims to address the problems of traditional multidimensional scaling techniques when these techniques are used to visualize non-metric similarities. The new technique, called multiple maps t-SNE, alleviates these problems by constructing a collection of maps that reveal complementary structure in the similarity data. We apply multiple maps t-SNE to a large data set of word association data and to a data set of NIPS co-authorships, demonstrating its ability to successfully visualize non-metric similarities.
References
Banerjee, A., Krumpelman, C., Basu, S., Mooney, R., & Ghosh, J. (2005). Model based overlapping clustering. In Proceedings of the ACM SIGKDD conference on knowledge discovery and data mining.
Belkin, M., & Niyogi, P. (2002). Laplacian Eigenmaps and spectral techniques for embedding and clustering. In Advances in neural information processing systems (Vol. 14, pp. 585–591).
Belongie, S., Malik, J., & Puzicha, J. (2001). Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(4), 509–522.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022.
Blei, D. M., Griffiths, T. L., Jordan, M. I., & Tenenbaum, J. B. (2004). Hierarchical topic models and the nested Chinese restaurant process. In S. Thrun, L. Saul, & B. Schölkopf (Eds.), Advances in neural information processing systems (Vol. 16, pp. 17–24). Cambridge: The MIT Press.
Borg, I., & Groenen, P. J. F. (2005). Modern multidimensional scaling (2nd ed.). New York: Springer.
Bostock, M., Ogievetsky, V., & Heer, J. (2011). D3: Data-driven documents. IEEE Transactions on Visualization and Computer Graphics, 17(12), 2301–2309.
Breitkreutz, B.-J., Stark, C., & Tyers, M. (2003). Osprey: a network visualization system. Genome Biology, 4(3), R22.1–R22.4.
Carreira-Perpiñán, M. Á. (2010). The elastic embedding algorithm for dimensionality reduction. In Proceedings of the 27th international conference on machine learning (pp. 167–174).
Cayton, L., & Dasgupta, S. (2006). Robust Euclidean embedding. In Proceedings of the 23rd international conference on machine learning (pp. 169–176).
Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of the international conference on machine learning (pp. 160–167).
Cook, J. A., Sutskever, I., Mnih, A., & Hinton, G. E. (2007). Visualizing similarity data with a mixture of maps. JMLR Workshop and Conference Proceedings, 2, 67–74.
Erhan, D., Bengio, Y., Courville, A., Manzagol, P.-A., Vincent, P., & Bengio, S. (2010) Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 11, 625–660.
Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315, 972–976.
Gashi, I., Stankovic, V., Leita, C., & Thonnard, O. (2009). An experimental study of diversity with off-the-shelf antivirus engines. In Proceedings of the IEEE international symposium on network computing and applications (pp. 4–11).
Globerson, A., & Roweis, S. (2007). Visualizing pairwise similarity via semidefinite programming. In Proceedings of the 11th international workshop on artificial intelligence and statistics (AI-STATS) (pp. 139–146).
Globerson, A., Chechik, G., Pereira, F., & Tishby, N. (2007). Euclidean embedding of co-occurrence data. Journal of Machine Learning Research, 8, 2265–2295.
Griffiths, T. L., Steyvers, M., & Tenenbaum, J. L. (2007). Topics in semantic representation. Psychological Review, 114(2), 211–244.
Heller, K. A., & Ghahramani, Z. (2007). A nonparametric Bayesian approach to modeling overlapping clusters. In Proceedings of the 11th international conference on artificial intelligence and statistics.
Hinton, G. E., & Roweis, S. T. (2003). Stochastic neighbor embedding. In Advances in neural information processing systems (Vol. 15, pp. 833–840).
Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22th annual international SIGIR conference (pp. 50–57). New York: ACM Press.
Jacobs, R. A. (1988). Increased rates of convergence through learning rate adaptation. Neural Networks, 1, 295–307.
Jäkel, F., Schölkopf, B., & Wichmann, F. A. (2008). Similarity, kernels, and the triangle inequality. Journal of Mathematical Psychology, 52(2), 297–303.
Jamieson, A. R., Giger, M. L., Drukker, K., Li, H., Yuan, Y., & Bhooshan, N. (2010). Exploring nonlinear feature space dimension reduction and data representation in breast CADx with Laplacian Eigenmaps and t-SNE. Medical Physics, 37(1), 339–351.
Keim, D. A., Kohlhammer, J., Ellis, G., & Mansmann, F. (2010). Mastering the information age; solving problems with visual analytics. Eurographics Association.
Klimt, B., & Yang, Y. (2004). Lecture notes in computer science: Vol. 3201. The Enron corpus: a new dataset for email classification research (pp. 217–226).
Kruskal, J. B., & Wish, M. (1986). Multidimensional scaling. Beverly Hills: Sage.
Lacoste-Julien, S., Sha, F., & Jordan, M. I. (2009). DiscLDA: Discriminative learning for dimensionality reduction and classification. In Advances in neural information processing systems (Vol. 21, pp. 897–904).
Lafon, S., & Lee, A. B. (2006). Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9), 1393–1403.
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240.
Laub, J., & Müller, K.-R. (2004). Feature discovery in non-metric pairwise data. Journal of Machine Learning Research, 5, 801–818.
Laub, J., Macke, J., Müller, K.-R., & Wichmann, F. A. (2007). Inducing metric violations in human similarity judgements. In Advances in neural information processing systems (Vol. 19, pp. 777–784).
Lawrence, N. D. (2011). Spectral dimensionality reduction via maximum entropy. In Proceedings of the international conference on artificial intelligence and statistics (pp. 51–59).
Lund, K., Burgess, C., & Atchley, R. A. (1995). Semantic and associative priming in high-dimensional semantic space. In Proceedings of the 17th annual conference of the cognitive science society (pp. 660–665). Mahwah: Erlbaum.
Mao, Y., Balasubramanian, K., & Lebanon, G. (2010). Dimensionality reduction for text using domain knowledge. In Proceedings of the 23rd international conference on computational linguistics (pp. 801–809).
McCallum, A. (1999). Multi-label text classification with a mixture model trained by em. In AAAI workshop on text learning. New York: ACM.
McCallum, A., Corrada-Emmanuel, A., & Wang, X. (2004). The author-recipient-topic model for topic and role discovery in social networks: experiments with Enron and academic email (Technical Report UM-CS-2004-096). Department of Computer Science, University of Massachusetts, Amherst, MA.
Mnih, A., & Hinton, G. E. (2009). A scalable hierarchical distributed language model. In Advances in neural information processing systems (pp. 1081–1088).
Nelson, D. L., McEvoy, C. L., & Schreiber, T. A. (1998). The University of South Florida word association, rhyme, and word fragment norms.
Pekalska, E., & Duin, R. P. W. (2005). The dissimilarity representation for pattern recognition: foundations and applications. Singapore: World Scientific.
Plaisant, C. (2004). The challenge of information visualization evaluation. In Proceedings of the working conference on advanced visual interfaces.
Rosen-Zvi, M., Griffiths, T., Steyversand, M., & Smyth, P. (2004). The author-topic model for authors and documents. In Proceedings of the 20th conference on uncertainty in artificial intelligence. Arlington: AUAI Press.
Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by Locally Linear Embedding. Science, 290(5500), 2323–2326.
Sammon, J. W. (1969). A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, 18(5), 401–409.
Schmidtlein, S., Zimmermann, P., Schüpferling, R., & Weiss, C. (2007). Mapping the floristic continuum: ordination space position estimated from imaging spectroscopy. Journal of Vegetation Science, 18, 131–140.
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge: MIT Press.
Schölkopf, B., Smola, A. J., & Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), 1299–1319.
Shaw, B., & Jebara, T. (2009). Structure preserving embedding. In Proceedings of the international conference on machine learning (pp. 937–944).
Steyvers, M., & Tenenbaum, J. B. (2005). The large-scale structure of semantic networks: statistical analyses and a model of semantic growth. Cognitive Science, 29(1), 41–78.
Teh, Y., Jordan, M. I., Beal, M., & Blei, D. M. (2004). Hierarchical Dirichlet processes. In Advances in neural information processing systems (Vol. 17, pp. 1385–1392). Cambridge: MIT Press.
Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.
Thomas, J. J., & Cook, K. A. (2005). Illuminating the path: the research and development agenda for visual analytics.
Thonnard, O., Mees, W., & Dacier, M. (2009). Addressing the attack attribution problem using knowledge discovery and multi-criteria fuzzy decision-making. In Proceedings of the ACM SIGKDD workshop on CyberSecurity and intelligence informatics (pp. 11–21).
Torgerson, W. S. (1952). Multidimensional scaling I: theory and method. Psychometrika, 17, 401–419.
Tversky, A., & Hutchinson, J. W. (1986). Nearest neighbor analysis of psychological spaces. Psychological Review, 93(11), 3–22.
van der Maaten, L. J. P. (2009). Learning a parametric embedding by preserving local structure. In Proceedings of the twelfth international conference on artificial intelligence and statistics (AI-STATS), JMLR W&CP (Vol. 5, pp. 384–391).
van der Maaten, L. J. P., & Hinton, G. E. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2431–2456.
van der Maaten, L. J. P., & Postma, E. O. (2010). Texton-based analysis of paintings. In SPIE optical engineering and applications (Vol. 7798-16).
Venna, J., Peltonen, J., Nybo, K., Aidos, H., & Kaski, S. (2010). Information retrieval perspective to nonlinear dimensionality reduction for data visualization. Journal of Machine Learning Research, 11, 451–490.
Villmann, T., & Haase, S. (2010). Mathematical foundations of the generalization of t-SNE and SNE for arbitrary divergences (Technical Report 02/2010). University of Applied Sciences Mittweida.
von Luxburg, U. (2010). Clustering stability: an overview. Foundations and Trends in Machine Learning, 2(3), 235–274.
Weinberger, K. Q., Packer, B. D., & Saul, L. K. (2005). Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization. In Proceedings of the 10th international workshop on AI and statistics. Barbados: Society for Artificial Intelligence and Statistics.
Yang, Z., King, I., Oja, E., & Xu, Z. (2010). Heavy-tailed symmetric stochastic neighbor embedding. In Advances in neural information processing systems (Vol. 22). Cambridge: MIT Press.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Paolo Frasconi.
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
van der Maaten, L., Hinton, G. Visualizing non-metric similarities in multiple maps. Mach Learn 87, 33–55 (2012). https://doi.org/10.1007/s10994-011-5273-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-011-5273-4
Keywords
- Multidimensional scaling
- Embedding
- Data visualization
- Non-metric similarities