Machine Learning

, Volume 87, Issue 1, pp 33–55 | Cite as

Visualizing non-metric similarities in multiple maps

Open Access
Article

Abstract

Techniques for multidimensional scaling visualize objects as points in a low-dimensional metric map. As a result, the visualizations are subject to the fundamental limitations of metric spaces. These limitations prevent multidimensional scaling from faithfully representing non-metric similarity data such as word associations or event co-occurrences. In particular, multidimensional scaling cannot faithfully represent intransitive pairwise similarities in a visualization, and it cannot faithfully visualize “central” objects. In this paper, we present an extension of a recently proposed multidimensional scaling technique called t-SNE. The extension aims to address the problems of traditional multidimensional scaling techniques when these techniques are used to visualize non-metric similarities. The new technique, called multiple maps t-SNE, alleviates these problems by constructing a collection of maps that reveal complementary structure in the similarity data. We apply multiple maps t-SNE to a large data set of word association data and to a data set of NIPS co-authorships, demonstrating its ability to successfully visualize non-metric similarities.

Keywords

Multidimensional scaling Embedding Data visualization Non-metric similarities 

References

  1. Banerjee, A., Krumpelman, C., Basu, S., Mooney, R., & Ghosh, J. (2005). Model based overlapping clustering. In Proceedings of the ACM SIGKDD conference on knowledge discovery and data mining. Google Scholar
  2. Belkin, M., & Niyogi, P. (2002). Laplacian Eigenmaps and spectral techniques for embedding and clustering. In Advances in neural information processing systems (Vol. 14, pp. 585–591). Google Scholar
  3. Belongie, S., Malik, J., & Puzicha, J. (2001). Shape matching and object recognition using shape contexts. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(4), 509–522. CrossRefGoogle Scholar
  4. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022. MATHGoogle Scholar
  5. Blei, D. M., Griffiths, T. L., Jordan, M. I., & Tenenbaum, J. B. (2004). Hierarchical topic models and the nested Chinese restaurant process. In S. Thrun, L. Saul, & B. Schölkopf (Eds.), Advances in neural information processing systems (Vol. 16, pp. 17–24). Cambridge: The MIT Press. Google Scholar
  6. Borg, I., & Groenen, P. J. F. (2005). Modern multidimensional scaling (2nd ed.). New York: Springer. MATHGoogle Scholar
  7. Bostock, M., Ogievetsky, V., & Heer, J. (2011). D3: Data-driven documents. IEEE Transactions on Visualization and Computer Graphics, 17(12), 2301–2309. CrossRefGoogle Scholar
  8. Breitkreutz, B.-J., Stark, C., & Tyers, M. (2003). Osprey: a network visualization system. Genome Biology, 4(3), R22.1–R22.4. Google Scholar
  9. Carreira-Perpiñán, M. Á. (2010). The elastic embedding algorithm for dimensionality reduction. In Proceedings of the 27th international conference on machine learning (pp. 167–174). Google Scholar
  10. Cayton, L., & Dasgupta, S. (2006). Robust Euclidean embedding. In Proceedings of the 23rd international conference on machine learning (pp. 169–176). Google Scholar
  11. Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of the international conference on machine learning (pp. 160–167). CrossRefGoogle Scholar
  12. Cook, J. A., Sutskever, I., Mnih, A., & Hinton, G. E. (2007). Visualizing similarity data with a mixture of maps. JMLR Workshop and Conference Proceedings, 2, 67–74. Google Scholar
  13. Erhan, D., Bengio, Y., Courville, A., Manzagol, P.-A., Vincent, P., & Bengio, S. (2010) Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 11, 625–660. MathSciNetGoogle Scholar
  14. Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315, 972–976. MathSciNetMATHCrossRefGoogle Scholar
  15. Gashi, I., Stankovic, V., Leita, C., & Thonnard, O. (2009). An experimental study of diversity with off-the-shelf antivirus engines. In Proceedings of the IEEE international symposium on network computing and applications (pp. 4–11). Google Scholar
  16. Globerson, A., & Roweis, S. (2007). Visualizing pairwise similarity via semidefinite programming. In Proceedings of the 11th international workshop on artificial intelligence and statistics (AI-STATS) (pp. 139–146). Google Scholar
  17. Globerson, A., Chechik, G., Pereira, F., & Tishby, N. (2007). Euclidean embedding of co-occurrence data. Journal of Machine Learning Research, 8, 2265–2295. MathSciNetMATHGoogle Scholar
  18. Griffiths, T. L., Steyvers, M., & Tenenbaum, J. L. (2007). Topics in semantic representation. Psychological Review, 114(2), 211–244. CrossRefGoogle Scholar
  19. Heller, K. A., & Ghahramani, Z. (2007). A nonparametric Bayesian approach to modeling overlapping clusters. In Proceedings of the 11th international conference on artificial intelligence and statistics. Google Scholar
  20. Hinton, G. E., & Roweis, S. T. (2003). Stochastic neighbor embedding. In Advances in neural information processing systems (Vol. 15, pp. 833–840). Google Scholar
  21. Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22th annual international SIGIR conference (pp. 50–57). New York: ACM Press. Google Scholar
  22. Jacobs, R. A. (1988). Increased rates of convergence through learning rate adaptation. Neural Networks, 1, 295–307. CrossRefGoogle Scholar
  23. Jäkel, F., Schölkopf, B., & Wichmann, F. A. (2008). Similarity, kernels, and the triangle inequality. Journal of Mathematical Psychology, 52(2), 297–303. MathSciNetMATHCrossRefGoogle Scholar
  24. Jamieson, A. R., Giger, M. L., Drukker, K., Li, H., Yuan, Y., & Bhooshan, N. (2010). Exploring nonlinear feature space dimension reduction and data representation in breast CADx with Laplacian Eigenmaps and t-SNE. Medical Physics, 37(1), 339–351. CrossRefGoogle Scholar
  25. Keim, D. A., Kohlhammer, J., Ellis, G., & Mansmann, F. (2010). Mastering the information age; solving problems with visual analytics. Eurographics Association. Google Scholar
  26. Klimt, B., & Yang, Y. (2004). Lecture notes in computer science: Vol. 3201. The Enron corpus: a new dataset for email classification research (pp. 217–226). Google Scholar
  27. Kruskal, J. B., & Wish, M. (1986). Multidimensional scaling. Beverly Hills: Sage. Google Scholar
  28. Lacoste-Julien, S., Sha, F., & Jordan, M. I. (2009). DiscLDA: Discriminative learning for dimensionality reduction and classification. In Advances in neural information processing systems (Vol. 21, pp. 897–904). Google Scholar
  29. Lafon, S., & Lee, A. B. (2006). Diffusion maps and coarse-graining: a unified framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9), 1393–1403. CrossRefGoogle Scholar
  30. Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240. CrossRefGoogle Scholar
  31. Laub, J., & Müller, K.-R. (2004). Feature discovery in non-metric pairwise data. Journal of Machine Learning Research, 5, 801–818. MATHGoogle Scholar
  32. Laub, J., Macke, J., Müller, K.-R., & Wichmann, F. A. (2007). Inducing metric violations in human similarity judgements. In Advances in neural information processing systems (Vol. 19, pp. 777–784). Google Scholar
  33. Lawrence, N. D. (2011). Spectral dimensionality reduction via maximum entropy. In Proceedings of the international conference on artificial intelligence and statistics (pp. 51–59). Google Scholar
  34. Lund, K., Burgess, C., & Atchley, R. A. (1995). Semantic and associative priming in high-dimensional semantic space. In Proceedings of the 17th annual conference of the cognitive science society (pp. 660–665). Mahwah: Erlbaum. Google Scholar
  35. Mao, Y., Balasubramanian, K., & Lebanon, G. (2010). Dimensionality reduction for text using domain knowledge. In Proceedings of the 23rd international conference on computational linguistics (pp. 801–809). Google Scholar
  36. McCallum, A. (1999). Multi-label text classification with a mixture model trained by em. In AAAI workshop on text learning. New York: ACM. Google Scholar
  37. McCallum, A., Corrada-Emmanuel, A., & Wang, X. (2004). The author-recipient-topic model for topic and role discovery in social networks: experiments with Enron and academic email (Technical Report UM-CS-2004-096). Department of Computer Science, University of Massachusetts, Amherst, MA. Google Scholar
  38. Mnih, A., & Hinton, G. E. (2009). A scalable hierarchical distributed language model. In Advances in neural information processing systems (pp. 1081–1088). Google Scholar
  39. Nelson, D. L., McEvoy, C. L., & Schreiber, T. A. (1998). The University of South Florida word association, rhyme, and word fragment norms. Google Scholar
  40. Pekalska, E., & Duin, R. P. W. (2005). The dissimilarity representation for pattern recognition: foundations and applications. Singapore: World Scientific. MATHCrossRefGoogle Scholar
  41. Plaisant, C. (2004). The challenge of information visualization evaluation. In Proceedings of the working conference on advanced visual interfaces. Google Scholar
  42. Rosen-Zvi, M., Griffiths, T., Steyversand, M., & Smyth, P. (2004). The author-topic model for authors and documents. In Proceedings of the 20th conference on uncertainty in artificial intelligence. Arlington: AUAI Press. Google Scholar
  43. Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by Locally Linear Embedding. Science, 290(5500), 2323–2326. CrossRefGoogle Scholar
  44. Sammon, J. W. (1969). A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, 18(5), 401–409. CrossRefGoogle Scholar
  45. Schmidtlein, S., Zimmermann, P., Schüpferling, R., & Weiss, C. (2007). Mapping the floristic continuum: ordination space position estimated from imaging spectroscopy. Journal of Vegetation Science, 18, 131–140. CrossRefGoogle Scholar
  46. Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge: MIT Press. Google Scholar
  47. Schölkopf, B., Smola, A. J., & Müller, K.-R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), 1299–1319. CrossRefGoogle Scholar
  48. Shaw, B., & Jebara, T. (2009). Structure preserving embedding. In Proceedings of the international conference on machine learning (pp. 937–944). Google Scholar
  49. Steyvers, M., & Tenenbaum, J. B. (2005). The large-scale structure of semantic networks: statistical analyses and a model of semantic growth. Cognitive Science, 29(1), 41–78. CrossRefGoogle Scholar
  50. Teh, Y., Jordan, M. I., Beal, M., & Blei, D. M. (2004). Hierarchical Dirichlet processes. In Advances in neural information processing systems (Vol. 17, pp. 1385–1392). Cambridge: MIT Press. Google Scholar
  51. Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323. CrossRefGoogle Scholar
  52. Thomas, J. J., & Cook, K. A. (2005). Illuminating the path: the research and development agenda for visual analytics. Google Scholar
  53. Thonnard, O., Mees, W., & Dacier, M. (2009). Addressing the attack attribution problem using knowledge discovery and multi-criteria fuzzy decision-making. In Proceedings of the ACM SIGKDD workshop on CyberSecurity and intelligence informatics (pp. 11–21). CrossRefGoogle Scholar
  54. Torgerson, W. S. (1952). Multidimensional scaling I: theory and method. Psychometrika, 17, 401–419. MathSciNetMATHCrossRefGoogle Scholar
  55. Tversky, A., & Hutchinson, J. W. (1986). Nearest neighbor analysis of psychological spaces. Psychological Review, 93(11), 3–22. CrossRefGoogle Scholar
  56. van der Maaten, L. J. P. (2009). Learning a parametric embedding by preserving local structure. In Proceedings of the twelfth international conference on artificial intelligence and statistics (AI-STATS), JMLR W&CP (Vol. 5, pp. 384–391). Google Scholar
  57. van der Maaten, L. J. P., & Hinton, G. E. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2431–2456. Google Scholar
  58. van der Maaten, L. J. P., & Postma, E. O. (2010). Texton-based analysis of paintings. In SPIE optical engineering and applications (Vol. 7798-16). Google Scholar
  59. Venna, J., Peltonen, J., Nybo, K., Aidos, H., & Kaski, S. (2010). Information retrieval perspective to nonlinear dimensionality reduction for data visualization. Journal of Machine Learning Research, 11, 451–490. MathSciNetGoogle Scholar
  60. Villmann, T., & Haase, S. (2010). Mathematical foundations of the generalization of t-SNE and SNE for arbitrary divergences (Technical Report 02/2010). University of Applied Sciences Mittweida. Google Scholar
  61. von Luxburg, U. (2010). Clustering stability: an overview. Foundations and Trends in Machine Learning, 2(3), 235–274. Google Scholar
  62. Weinberger, K. Q., Packer, B. D., & Saul, L. K. (2005). Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization. In Proceedings of the 10th international workshop on AI and statistics. Barbados: Society for Artificial Intelligence and Statistics. Google Scholar
  63. Yang, Z., King, I., Oja, E., & Xu, Z. (2010). Heavy-tailed symmetric stochastic neighbor embedding. In Advances in neural information processing systems (Vol. 22). Cambridge: MIT Press. Google Scholar

Copyright information

© The Author(s) 2011

Authors and Affiliations

  1. 1.Pattern Recognition and Bioinformatics LaboratoryDelft University of TechnologyDelftThe Netherlands
  2. 2.Department of Computer ScienceUniversity of TorontoTorontoCanada

Personalised recommendations