Machine Learning

, Volume 99, Issue 2, pp 189–229 | Cite as

Information retrieval approach to meta-visualization

Article

Abstract

Visualization is crucial in the first steps of data analysis. In visual data exploration with scatter plots, no single plot is sufficient to analyze complicated high-dimensional data sets. Given numerous visualizations created with different features or methods, meta-visualization is needed to analyze the visualizations together. We solve how to arrange numerous visualizations onto a meta-visualization display, so that their similarities and differences can be analyzed. Visualization has recently been formalized as an information retrieval task; we extend this approach, and formalize meta-visualization as an information retrieval task whose performance can be rigorously quantified and optimized. We introduce a machine learning approach to optimize the meta-visualization, based on an information retrieval perspective: two visualizations are similar if the analyst would retrieve similar neighborhoods between data samples from either visualization. Based on the approach, we introduce a nonlinear embedding method for meta-visualization: it optimizes locations of visualizations on a display, so that visualizations giving similar information about data are close to each other. In experiments we show such meta-visualization outperforms alternatives, and yields insight into data in several case studies.

Keywords

Meta-visualization Neighbor embedding Nonlinear dimensionality reduction 

References

  1. Agrafiotis, D. K. (2003). Stochastic proximity embedding. Journal of Computational Chemistry, 24(10), 1215–1221.CrossRefGoogle Scholar
  2. Asimov, D. (1985). The grand tour: A tool for viewing multidimensional data. SIAM Journal on Scientific and Statistical Computing, 6(1), 128–143.MATHMathSciNetCrossRefGoogle Scholar
  3. Belkin, M., & Niyogi, P. (2002). Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in Neural Information Processing Systems (vol. 14, pp. 585–591). Cambridge, MA: MIT Press.Google Scholar
  4. Bertini, E., Tatu, A., & Keim, D. (2011). Quality metrics in high-dimensional data visualization: An overview and systematization. In: Proceedings of the IEEE Transactions on Visualization and Computer Graphics, (vol. 17(12), pp. 2203–2212).Google Scholar
  5. Borg, I., & Groenen, P. (2005). Modern multidimensional scaling: Theory and applications. Berlin: Springer.Google Scholar
  6. Bradham, C., & McClay, D. R. (2006). Perspective p38 MAPK in development and cancer. Cell Cycle, 5(8), 824–828.CrossRefGoogle Scholar
  7. Caldas, J., Gehlenborg, N., Faisal, A., Brazma, A., & Kaski, S. (2009). Probabilistic retrieval and visualization of biologically relevant microarray experiments. Bioinformatics, 25(12), i145–i153.CrossRefGoogle Scholar
  8. Chang, H., Yeung, D. Y., & Xiong, Y. (2004). Super-resolution through neighbor embedding. In: Proceedings of CVPR 2004, the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (vol. 1, pp. I-I). IEEEGoogle Scholar
  9. Cheung, C. T., Deisher, T. A., Luo, H., Yanagawa, B., Bonigut, S., Samra, A., et al. (2007). Neutralizing anti-4-1BBL treatment improves cardiac function in viral myocarditis. Laboratory Investigation, 87(7), 651–661.CrossRefGoogle Scholar
  10. Child, D. (2006). The essentials of factor analysis. London: Continuum International.Google Scholar
  11. Claessen, J., & van Wijk, J. (2011). Flexible linked axes for multivariate data visualization. In: Proceedings of the IEEE Transactions on Visualization and Computer Graphics, (vol. 17(12), pp. 2310–2316). IEEEGoogle Scholar
  12. Cockburn, A., Karlson, A., & Bederson, B. B. (2009). A review of overview+detail, zooming, and focus+context interfaces. ACM Computing Surveys, 14(1), 2:1–2:31.Google Scholar
  13. Cook, J., Sutskever, I., Mnih, A., & Hinton, G. (2007). Visualizing similarity data with a mixture of maps. In: Proceedings of AISTATS 2007, International Conference on Artificial Intelligence and Statistics, JMLR W&CP 2 (vol. 2, pp. 67–74). JMLR.Google Scholar
  14. Donoho, D. L., & Grimes, C. (2003). Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data. Proceedings of the National Academy of Sciences, 100(10), 5591–5596.MATHMathSciNetCrossRefGoogle Scholar
  15. Franco, D., & Campione, M. (2003). The role of Pitx2 during cardiac development. Linking left-right signaling and congenital heart diseases. Trends in Cardiovascular Medicine, 13(4), 157–163.CrossRefGoogle Scholar
  16. Frustaci, A., Chimenti, C., Ricci, R., Natale, L., Russo, M. A., Pieroni, M., et al. (2001). Improvement in cardiac function in the cardiac variant of Fabry’s disease with galactose-infusion therapy. The New England Journal of Medicine, 345(1), 25–32.CrossRefGoogle Scholar
  17. Gourier, N., Hall, D., & Crowley, J. L. (2004). Estimating Face Orientation from Robust Detection of Salient Facial Features. In: Proceedings of Pointing 2004, ICPR, International Workshop on Visual Observation of Deictic Gestures.Google Scholar
  18. Guan, N., Tao, D., Luo, Z., & Yuan, B. (2011). Non-negative patch alignment framework. IEEE Transactions on Neural Networks, 22, 1218–1230.CrossRefGoogle Scholar
  19. Hinton, G., & Roweis, S. (2003). Stochastic neighbor embedding. In: Proceedings of the Advances in Neural Information Processing Systems (vol. 15, pp. 833–840). Cambridge, MA: MIT Press.Google Scholar
  20. Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24(417–41), 498–520.CrossRefGoogle Scholar
  21. Kehrer, J., & Hauser, H. (2013). Visualization and visual analysis of multifaceted scientific data: A survey. IEEE Transactions on Visualization and Computer Graphics, 19(3), 495–513.CrossRefGoogle Scholar
  22. Lafon, S., & Lee, A. (2006). Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning, and data set parameterization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(9), 1393–1403.CrossRefGoogle Scholar
  23. Lawrence, N. D. (2004). Gaussian process latent variable models for visualisation of high dimensional data. In: Proceedings of the Advances in Neural Information Processing Systems, vol. 16. Cambridge, MA: MIT Press.Google Scholar
  24. Lee, J. A., Lendasse, A., & Verleysen, M. (2004). Nonlinear projection with curvilinear distances: Isomap versus curvilinear distance analysis. Neurocomputing, 57, 49–76.CrossRefGoogle Scholar
  25. Lowe, S. W., & Lin, A. W. (2000). Apoptosis in cancer. Carcinogenesis, 21(3), 485–495.CrossRefGoogle Scholar
  26. van der Maaten, L. (2009). Learning a parametric embedding by preserving local structure. In D. A. V. Dyk, M. Welling (eds.), Proceedings of AISTATS 2009, International Workshop on Artificial Intelligence and Statistics, JMLR W&CP 5 (pp. 384–391). JMLR.Google Scholar
  27. van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.MATHGoogle Scholar
  28. van der Maaten, L., Postma, E., & van der Herik, J. (2009). Dimensionality reduction: A comparative review. Technical Report, Tilburg centre for Creative Computing, Tilburg University.Google Scholar
  29. Nguyen, G. P., & Worring, M. (2008). Interactive access to large image collections using similarity-based visualization. Journal of Visual Languages and Computing, 19(2), 203–224.CrossRefGoogle Scholar
  30. Parkinson, H. E., Kapushesky, M., Kolesnikov, N., Rustici, G., Shojatalab, M., Abeygunawardena, N., Berube, H., Dylag, M., Emam, I., Farne, A., Holloway, E., Lukk, M., Malone, J., Mani, R., Pilicheva, E., Rayner, T. F., Rezwan, F. I., Sharma, A., Williams, E., Bradley, X. Z., Adamusiak, T., Brandizi, M., Burdett, T., Coulson, R., Krestyaninova, M., Kurnosov, P., Maguire, E., Neogi, S. G., Rocca-Serra, P., Sansone, S. A., Sklyar, N., Zhao, M., Sarkans, U., & Brazma, A. (2009). Arrayexpress update - from an archive of functional genomics experiments to the atlas of gene expression. Nucleic Acids Research 37(Database-Issue), 868–872.Google Scholar
  31. Patwari, N., & Hero, A. O. (2004). Manifold learning algorithms for localization in wireless sensor networks. In Proceedings of ICASSP 2004, International Conference on Acoustics, Speech, and Signal Processing, IEEE (pp. III-857-III-860).Google Scholar
  32. Peltonen, J., & Georgatzis, K. (2012). Efficient optimization for data visualization as an information retrieval task. In Proceedings of MLSP 2012, the 2012 IEEE International Workshop on Machine Learning for Signal Processing IEEE, electronic proceedings.Google Scholar
  33. Peltonen, J., & Kaski, S. (2011). Generative modeling for maximizing precision and recall in information visualization. In Gordon, G., Dunson, D., Dudik, M. (eds.), Proceedings of AISTATS 2011, the Fourteenth International Conference on Artificial Intelligence and Statistics, JMLR W&CP 15, JMLR (vol 15, pp. 597–587).Google Scholar
  34. Peltonen, J., & Lin, Z. (2013). Information retrieval perspective to meta-visualization. In C.S. Ong, T.B. Ho (eds.), Proceedings of ACML 2013, Fifth Asian Conference on Machine Learning, JMLR W&CP 29 (pp. 165–180). JMLR.Google Scholar
  35. Peng, W., Ward, M. O., & Rundensteiner, E. A. (2004). Clutter reduction in multi-dimensional data visualization using dimension reordering. In Proceedings of INFOVIS ’04, the IEEE Symposium on Information Visualization (pp. 89–96). IEEE Computer Society.Google Scholar
  36. Pópulo, H., Lopes, J. M., & Soares, P. (2012). The mTOR signalling pathway in human cancer. International Journal of Molecular Sciences, 13(2), 1886–1918.CrossRefGoogle Scholar
  37. Porcherie, A., Mathieu, C., Peronet, R., Schneider, E., Claver, J., Commere, P. H., et al. (2011). Critical role of the neutrophil-associated high-affinity receptor for IgE in the pathogenesis of experimental cerebral malaria. The Journal of Experimental Medicine, 208(11), 2225–2236.CrossRefGoogle Scholar
  38. Ramsay, A. K. (2010). Validation of the MEK5 and ERK5 pathway as targets for therapy in prostate cancer and analysis of the erk5 signalling complex. Md thesis, Scotland: University of Glasgow.Google Scholar
  39. Robinson, A., & Weaver, C. (2006). Re-visualization: Interactive visualization of the process of visual analysis. In Proceedings of the GI Science Workshop on Visual Analytics & Spatial Decision Support 2006, electronic proceedings.Google Scholar
  40. Roweis, S., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290, 2323–2326.CrossRefGoogle Scholar
  41. Sammon, J. W. (1969). A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, 18(5), 401–409.CrossRefGoogle Scholar
  42. Schölkopf, B., Smola, A. J., & Müller, K. R. (1999). Kernel principal component analysis. In Advances in kernel methods: support vector learning (pp. 327–352). Cambridge, MA: MIT Press.Google Scholar
  43. Sharma, A., & Paliwal, K. K. (2007). Fast principal component analysis using fixed-point algorithm. Pattern Recognition Letters, 28, 1151–1155.CrossRefGoogle Scholar
  44. Sikachev, P., Amirkhanov, A., Laramee, R. S., & Mistelbauer. G. (2011). Interactive algorithm exploration using meta visualization. Tech. rep., Institute of Computer Graphics and Algorithms, Vienna University of Technology, Favoritenstrasse 9–11/186, A-1040 Vienna.Google Scholar
  45. Tatu, A., Albuquerque, G., Eisemann, M., Schneidewind, J., Theisel, H., Magnor, M. A., & Keim, D. A. (2009). Combining automated analysis and visualization techniques for effective exploration of high-dimensional data. In Proceedings of IEEE VAST 2009, the IEEE Symposium on Visual Analytics Science and Technology (pp. 59–66). IEEE.Google Scholar
  46. Tatu, A., Maas, F., Farber, I., Bertini, E., Schreck, T., Seidl, T., & Keim, D. (2012). Subspace search and visualization to make sense of alternative clusterings in high-dimensional data. In Proceedings of IEEE VAST 2012, the IEEE Conference on Visual Analytics Science and Technology (pp. 63–72). IEEE.Google Scholar
  47. Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319–2323.CrossRefGoogle Scholar
  48. Tipping, M. E., & Bishop, C. M. (1999). Probabilistic principal component analysis. Journal of the Royal Statistical Society, Series B, 61, 611–622.MATHMathSciNetCrossRefGoogle Scholar
  49. Venna, J., & Kaski, S. (2007). Comparison of visualization methods for an atlas of gene expression data sets. Information Visualization, 6(2), 139–154.CrossRefGoogle Scholar
  50. Venna, J., Peltonen, J., Nybo, K., Aidos, H., & Kaski, S. (2010). Information retrieval perspective to nonlinear dimensionality reduction for data visualization. Journal of Machine Learning Research, 11, 451–490.MATHMathSciNetGoogle Scholar
  51. Vesanto, J. (1999). SOM-based data visualization methods. Intelligent Data Analysis, 3, 111–126.MATHCrossRefGoogle Scholar
  52. Viau, C., & McGuffin, M. J. (2012). Connectedcharts: Explicit visualization of relationships between data graphics. Computer Graphics Forum, 31(3pt4), 1285–1294.CrossRefGoogle Scholar
  53. Vladymyrov, M., & Carreira-Perpinan, M. (2014). Linear-time training of nonlinear low-dimensional embeddings. In Proceedings of AISTATS 2014, International Conference on Artificial Intelligence and Statistics, JMLR W&CP 33 (pp. 968–977). JMLR.Google Scholar
  54. Waters, J. P., Pober, J. S., & Bradley, J. R. (2013). Tumour necrosis factor and cancer. The Journal of Pathology, 230(3), 241–248.CrossRefGoogle Scholar
  55. Weaver, C. (2006). Metavisual exploration and analysis of DEVise coordination in Improvise. In Proceedings of CMV ’06, the Fourth International Conference on Coordinated & Multiple Views in Exploratory Visualization (pp. 79–90). IEEE Computer Society.Google Scholar
  56. Weinberger, K., & Saul, L. (2006). Unsupervised learning of image manifolds by semidefinite programming. International Journal of Computer Vision, 70(1), 77–90.CrossRefGoogle Scholar
  57. Weinberger, K. Q., Packer, B. D., & Saul, L. K. (2005). Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization. In Proceedings of AISTATS 2005, the 10th International Workshop on Artificial Intelligence and Statistics.Google Scholar
  58. Wickham, H., & Hofmann, H. (2011). Product plots. IEEE Transactions on Visualization and Computer Graphics, 17(12), 2223–2230.CrossRefGoogle Scholar
  59. Wismüller, A., Verleysen, M., Aupetit, M., & Lee, J. A. (2010). Recent advances in nonlinear dimensionality reduction, manifold and topological learning. In Proceedings of ESANN 2010, European Symposium on Artificial Neural Networks—Computational Intelligence and Machine Learning, d-side (pp. 71–80).Google Scholar
  60. Wong, P. C., & Bergeron, R. D. (1997). 30 years of multidimensional multivariate visualization. In Scientific Visualization: Overviews, Methodologies & Techniques (pp. 3–33). Los Alamitos, CA: IEEE Computer Society Press.Google Scholar
  61. Xu, C., Tao, D., & Xu, C. (2013). A survey on multi-view learning. CORR abs/13045634 Available at http://arxiv.org/abs/1304.5634
  62. Yan, S., Xu, D., Zhang, B., Zhang, H., Yang, Q., & Lin, S. (2007). Graph embedding and extensions: A general framework for dimensionality reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(1), 40–51.CrossRefGoogle Scholar
  63. Yang, Z., Peltonen, J., & Kaski, S. (2013). Scalable optimization of neighbor embedding for visualization. In Proceedings of ICML 2013, the 30th International Conference on Machine Learning, JMLR W&CP 28. JMLR.Google Scholar
  64. Zhang, T., Tao, D., Li, X., & Yang, J. (2009). Patch alignment for dimensionality reduction. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1299–1313.CrossRefGoogle Scholar
  65. Zhang, Z., & Zha, H. (2004). Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM Journal on Scientific Computing, 26(1), 313–338.MATHMathSciNetCrossRefGoogle Scholar
  66. Zhou, T., Tao, D., & Wu, X. (2011). Manifold elastic net: a unified framework for sparse dimension reduction. Data Mining and Knowledge Discovery, 22(3), 340–371.MATHMathSciNetCrossRefGoogle Scholar

Copyright information

© The Author(s) 2014

Authors and Affiliations

  1. 1.Helsinki Institute for Information Technology HIIT, Department of Information and Computer ScienceAalto UniversityAaltoFinland
  2. 2.School of Information Sciences, University of TampereTampereFinland

Personalised recommendations