# Information retrieval approach to meta-visualization

- 643 Downloads
- 1 Citations

## Abstract

Visualization is crucial in the first steps of data analysis. In visual data exploration with scatter plots, no single plot is sufficient to analyze complicated high-dimensional data sets. Given numerous visualizations created with different features or methods, meta-visualization is needed to analyze the visualizations together. We solve *how to arrange numerous visualizations onto a meta-visualization display*, so that their similarities and differences can be analyzed. Visualization has recently been formalized as an information retrieval task; we extend this approach, and formalize meta-visualization as an information retrieval task whose performance can be rigorously quantified and optimized. We introduce a machine learning approach to optimize the meta-visualization, based on an information retrieval perspective: two visualizations are similar if the analyst would retrieve similar neighborhoods between data samples from either visualization. Based on the approach, we introduce a nonlinear embedding method for meta-visualization: it optimizes locations of visualizations on a display, so that visualizations giving similar information about data are close to each other. In experiments we show such meta-visualization outperforms alternatives, and yields insight into data in several case studies.

## Keywords

Meta-visualization Neighbor embedding Nonlinear dimensionality reduction## Notes

### Acknowledgments

The work was supported by Academy of Finland, decisions 251170 (Finnish CoE in Computational Inference Research COIN), 252845 and 256233. Authors belong to COIN. We also acknowledge the computational resources provided by Aalto Science-IT project.

## References

- Agrafiotis, D. K. (2003). Stochastic proximity embedding.
*Journal of Computational Chemistry*,*24*(10), 1215–1221.CrossRefGoogle Scholar - Asimov, D. (1985). The grand tour: A tool for viewing multidimensional data.
*SIAM Journal on Scientific and Statistical Computing*,*6*(1), 128–143.MATHMathSciNetCrossRefGoogle Scholar - Belkin, M., & Niyogi, P. (2002). Laplacian eigenmaps and spectral techniques for embedding and clustering. In:
*Advances in Neural Information Processing Systems*(vol. 14, pp. 585–591). Cambridge, MA: MIT Press.Google Scholar - Bertini, E., Tatu, A., & Keim, D. (2011). Quality metrics in high-dimensional data visualization: An overview and systematization. In:
*Proceedings of the IEEE Transactions on Visualization and Computer Graphics*, (vol. 17(12), pp. 2203–2212).Google Scholar - Borg, I., & Groenen, P. (2005).
*Modern multidimensional scaling: Theory and applications*. Berlin: Springer.Google Scholar - Bradham, C., & McClay, D. R. (2006). Perspective p38 MAPK in development and cancer.
*Cell Cycle*,*5*(8), 824–828.CrossRefGoogle Scholar - Caldas, J., Gehlenborg, N., Faisal, A., Brazma, A., & Kaski, S. (2009). Probabilistic retrieval and visualization of biologically relevant microarray experiments.
*Bioinformatics*,*25*(12), i145–i153.CrossRefGoogle Scholar - Chang, H., Yeung, D. Y., & Xiong, Y. (2004). Super-resolution through neighbor embedding. In:
*Proceedings of CVPR 2004, the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition*, (vol. 1, pp. I-I). IEEEGoogle Scholar - Cheung, C. T., Deisher, T. A., Luo, H., Yanagawa, B., Bonigut, S., Samra, A., et al. (2007). Neutralizing anti-4-1BBL treatment improves cardiac function in viral myocarditis.
*Laboratory Investigation*,*87*(7), 651–661.CrossRefGoogle Scholar - Child, D. (2006).
*The essentials of factor analysis*. London: Continuum International.Google Scholar - Claessen, J., & van Wijk, J. (2011). Flexible linked axes for multivariate data visualization. In:
*Proceedings of the IEEE Transactions on Visualization and Computer Graphics*, (vol. 17(12), pp. 2310–2316). IEEEGoogle Scholar - Cockburn, A., Karlson, A., & Bederson, B. B. (2009). A review of overview+detail, zooming, and focus+context interfaces.
*ACM Computing Surveys*,*14*(1), 2:1–2:31.Google Scholar - Cook, J., Sutskever, I., Mnih, A., & Hinton, G. (2007). Visualizing similarity data with a mixture of maps. In:
*Proceedings of AISTATS 2007, International Conference on Artificial Intelligence and Statistics, JMLR W&CP 2*(vol. 2, pp. 67–74). JMLR.Google Scholar - Donoho, D. L., & Grimes, C. (2003). Hessian eigenmaps: Locally linear embedding techniques for high-dimensional data.
*Proceedings of the National Academy of Sciences*,*100*(10), 5591–5596.MATHMathSciNetCrossRefGoogle Scholar - Franco, D., & Campione, M. (2003). The role of Pitx2 during cardiac development. Linking left-right signaling and congenital heart diseases.
*Trends in Cardiovascular Medicine*,*13*(4), 157–163.CrossRefGoogle Scholar - Frustaci, A., Chimenti, C., Ricci, R., Natale, L., Russo, M. A., Pieroni, M., et al. (2001). Improvement in cardiac function in the cardiac variant of Fabry’s disease with galactose-infusion therapy.
*The New England Journal of Medicine*,*345*(1), 25–32.CrossRefGoogle Scholar - Gourier, N., Hall, D., & Crowley, J. L. (2004). Estimating Face Orientation from Robust Detection of Salient Facial Features. In:
*Proceedings of Pointing 2004, ICPR, International Workshop on Visual Observation of Deictic Gestures*.Google Scholar - Guan, N., Tao, D., Luo, Z., & Yuan, B. (2011). Non-negative patch alignment framework.
*IEEE Transactions on Neural Networks*,*22*, 1218–1230.CrossRefGoogle Scholar - Hinton, G., & Roweis, S. (2003). Stochastic neighbor embedding. In:
*Proceedings of the Advances in Neural Information Processing Systems*(vol. 15, pp. 833–840). Cambridge, MA: MIT Press.Google Scholar - Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components.
*Journal of Educational Psychology*,*24*(417–41), 498–520.CrossRefGoogle Scholar - Kehrer, J., & Hauser, H. (2013). Visualization and visual analysis of multifaceted scientific data: A survey.
*IEEE Transactions on Visualization and Computer Graphics*,*19*(3), 495–513.CrossRefGoogle Scholar - Lafon, S., & Lee, A. (2006). Diffusion maps and coarse-graining: A unified framework for dimensionality reduction, graph partitioning, and data set parameterization.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*28*(9), 1393–1403.CrossRefGoogle Scholar - Lawrence, N. D. (2004). Gaussian process latent variable models for visualisation of high dimensional data. In:
*Proceedings of the Advances in Neural Information Processing Systems*, vol. 16. Cambridge, MA: MIT Press.Google Scholar - Lee, J. A., Lendasse, A., & Verleysen, M. (2004). Nonlinear projection with curvilinear distances: Isomap versus curvilinear distance analysis.
*Neurocomputing*,*57*, 49–76.CrossRefGoogle Scholar - Lowe, S. W., & Lin, A. W. (2000). Apoptosis in cancer.
*Carcinogenesis*,*21*(3), 485–495.CrossRefGoogle Scholar - van der Maaten, L. (2009). Learning a parametric embedding by preserving local structure. In D. A. V. Dyk, M. Welling (eds.),
*Proceedings of AISTATS 2009, International Workshop on Artificial Intelligence and Statistics, JMLR W&CP 5*(pp. 384–391). JMLR.Google Scholar - van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE.
*Journal of Machine Learning Research*,*9*, 2579–2605.MATHGoogle Scholar - van der Maaten, L., Postma, E., & van der Herik, J. (2009).
*Dimensionality reduction: A comparative review*. Technical Report, Tilburg centre for Creative Computing, Tilburg University.Google Scholar - Nguyen, G. P., & Worring, M. (2008). Interactive access to large image collections using similarity-based visualization.
*Journal of Visual Languages and Computing*,*19*(2), 203–224.CrossRefGoogle Scholar - Parkinson, H. E., Kapushesky, M., Kolesnikov, N., Rustici, G., Shojatalab, M., Abeygunawardena, N., Berube, H., Dylag, M., Emam, I., Farne, A., Holloway, E., Lukk, M., Malone, J., Mani, R., Pilicheva, E., Rayner, T. F., Rezwan, F. I., Sharma, A., Williams, E., Bradley, X. Z., Adamusiak, T., Brandizi, M., Burdett, T., Coulson, R., Krestyaninova, M., Kurnosov, P., Maguire, E., Neogi, S. G., Rocca-Serra, P., Sansone, S. A., Sklyar, N., Zhao, M., Sarkans, U., & Brazma, A. (2009). Arrayexpress update - from an archive of functional genomics experiments to the atlas of gene expression.
*Nucleic Acids Research 37*(Database-Issue), 868–872.Google Scholar - Patwari, N., & Hero, A. O. (2004). Manifold learning algorithms for localization in wireless sensor networks. In
*Proceedings of ICASSP 2004, International Conference on Acoustics, Speech, and Signal Processing, IEEE*(pp. III-857-III-860).Google Scholar - Peltonen, J., & Georgatzis, K. (2012). Efficient optimization for data visualization as an information retrieval task. In
*Proceedings of MLSP 2012, the 2012 IEEE International Workshop on Machine Learning for Signal Processing IEEE, electronic proceedings*.Google Scholar - Peltonen, J., & Kaski, S. (2011). Generative modeling for maximizing precision and recall in information visualization. In Gordon, G., Dunson, D., Dudik, M. (eds.),
*Proceedings of AISTATS 2011, the Fourteenth International Conference on Artificial Intelligence and Statistics, JMLR W&CP 15, JMLR*(vol 15, pp. 597–587).Google Scholar - Peltonen, J., & Lin, Z. (2013). Information retrieval perspective to meta-visualization. In C.S. Ong, T.B. Ho (eds.),
*Proceedings of ACML 2013, Fifth Asian Conference on Machine Learning, JMLR W&CP 29*(pp. 165–180). JMLR.Google Scholar - Peng, W., Ward, M. O., & Rundensteiner, E. A. (2004). Clutter reduction in multi-dimensional data visualization using dimension reordering. In
*Proceedings of INFOVIS ’04, the IEEE Symposium on Information Visualization*(pp. 89–96). IEEE Computer Society.Google Scholar - Pópulo, H., Lopes, J. M., & Soares, P. (2012). The mTOR signalling pathway in human cancer.
*International Journal of Molecular Sciences*,*13*(2), 1886–1918.CrossRefGoogle Scholar - Porcherie, A., Mathieu, C., Peronet, R., Schneider, E., Claver, J., Commere, P. H., et al. (2011). Critical role of the neutrophil-associated high-affinity receptor for IgE in the pathogenesis of experimental cerebral malaria.
*The Journal of Experimental Medicine*,*208*(11), 2225–2236.CrossRefGoogle Scholar - Ramsay, A. K. (2010).
*Validation of the MEK5 and ERK5 pathway as targets for therapy in prostate cancer and analysis of the erk5 signalling complex*. Md thesis, Scotland: University of Glasgow.Google Scholar - Robinson, A., & Weaver, C. (2006). Re-visualization: Interactive visualization of the process of visual analysis. In
*Proceedings of the GI Science Workshop on Visual Analytics & Spatial Decision Support 2006, electronic proceedings*.Google Scholar - Roweis, S., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding.
*Science*,*290*, 2323–2326.CrossRefGoogle Scholar - Sammon, J. W. (1969). A nonlinear mapping for data structure analysis.
*IEEE Transactions on Computers*,*18*(5), 401–409.CrossRefGoogle Scholar - Schölkopf, B., Smola, A. J., & Müller, K. R. (1999). Kernel principal component analysis. In
*Advances in kernel methods: support vector learning*(pp. 327–352). Cambridge, MA: MIT Press.Google Scholar - Sharma, A., & Paliwal, K. K. (2007). Fast principal component analysis using fixed-point algorithm.
*Pattern Recognition Letters*,*28*, 1151–1155.CrossRefGoogle Scholar - Sikachev, P., Amirkhanov, A., Laramee, R. S., & Mistelbauer. G. (2011). Interactive algorithm exploration using meta visualization. Tech. rep., Institute of Computer Graphics and Algorithms, Vienna University of Technology, Favoritenstrasse 9–11/186, A-1040 Vienna.Google Scholar
- Tatu, A., Albuquerque, G., Eisemann, M., Schneidewind, J., Theisel, H., Magnor, M. A., & Keim, D. A. (2009). Combining automated analysis and visualization techniques for effective exploration of high-dimensional data. In
*Proceedings of IEEE VAST 2009, the IEEE Symposium on Visual Analytics Science and Technology*(pp. 59–66). IEEE.Google Scholar - Tatu, A., Maas, F., Farber, I., Bertini, E., Schreck, T., Seidl, T., & Keim, D. (2012). Subspace search and visualization to make sense of alternative clusterings in high-dimensional data. In
*Proceedings of IEEE VAST 2012, the IEEE Conference on Visual Analytics Science and Technology*(pp. 63–72). IEEE.Google Scholar - Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction.
*Science*,*290*(5500), 2319–2323.CrossRefGoogle Scholar - Tipping, M. E., & Bishop, C. M. (1999). Probabilistic principal component analysis.
*Journal of the Royal Statistical Society, Series B*,*61*, 611–622.MATHMathSciNetCrossRefGoogle Scholar - Venna, J., & Kaski, S. (2007). Comparison of visualization methods for an atlas of gene expression data sets.
*Information Visualization*,*6*(2), 139–154.CrossRefGoogle Scholar - Venna, J., Peltonen, J., Nybo, K., Aidos, H., & Kaski, S. (2010). Information retrieval perspective to nonlinear dimensionality reduction for data visualization.
*Journal of Machine Learning Research*,*11*, 451–490.MATHMathSciNetGoogle Scholar - Vesanto, J. (1999). SOM-based data visualization methods.
*Intelligent Data Analysis*,*3*, 111–126.MATHCrossRefGoogle Scholar - Viau, C., & McGuffin, M. J. (2012). Connectedcharts: Explicit visualization of relationships between data graphics.
*Computer Graphics Forum*,*31*(3pt4), 1285–1294.CrossRefGoogle Scholar - Vladymyrov, M., & Carreira-Perpinan, M. (2014). Linear-time training of nonlinear low-dimensional embeddings. In
*Proceedings of AISTATS 2014, International Conference on Artificial Intelligence and Statistics, JMLR W&CP 33*(pp. 968–977). JMLR.Google Scholar - Waters, J. P., Pober, J. S., & Bradley, J. R. (2013). Tumour necrosis factor and cancer.
*The Journal of Pathology*,*230*(3), 241–248.CrossRefGoogle Scholar - Weaver, C. (2006). Metavisual exploration and analysis of DEVise coordination in Improvise. In
*Proceedings of CMV ’06, the Fourth International Conference on Coordinated & Multiple Views in Exploratory Visualization*(pp. 79–90). IEEE Computer Society.Google Scholar - Weinberger, K., & Saul, L. (2006). Unsupervised learning of image manifolds by semidefinite programming.
*International Journal of Computer Vision*,*70*(1), 77–90.CrossRefGoogle Scholar - Weinberger, K. Q., Packer, B. D., & Saul, L. K. (2005). Nonlinear dimensionality reduction by semidefinite programming and kernel matrix factorization. In
*Proceedings of AISTATS 2005, the 10th International Workshop on Artificial Intelligence and Statistics*.Google Scholar - Wickham, H., & Hofmann, H. (2011). Product plots.
*IEEE Transactions on Visualization and Computer Graphics*,*17*(12), 2223–2230.CrossRefGoogle Scholar - Wismüller, A., Verleysen, M., Aupetit, M., & Lee, J. A. (2010). Recent advances in nonlinear dimensionality reduction, manifold and topological learning. In
*Proceedings of ESANN 2010, European Symposium on Artificial Neural Networks—Computational Intelligence and Machine Learning, d-side*(pp. 71–80).Google Scholar - Wong, P. C., & Bergeron, R. D. (1997). 30 years of multidimensional multivariate visualization. In
*Scientific Visualization: Overviews, Methodologies & Techniques*(pp. 3–33). Los Alamitos, CA: IEEE Computer Society Press.Google Scholar - Xu, C., Tao, D., & Xu, C. (2013). A survey on multi-view learning. CORR abs/13045634 Available at http://arxiv.org/abs/1304.5634
- Yan, S., Xu, D., Zhang, B., Zhang, H., Yang, Q., & Lin, S. (2007). Graph embedding and extensions: A general framework for dimensionality reduction.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*29*(1), 40–51.CrossRefGoogle Scholar - Yang, Z., Peltonen, J., & Kaski, S. (2013). Scalable optimization of neighbor embedding for visualization. In
*Proceedings of ICML 2013, the 30th International Conference on Machine Learning, JMLR W&CP 28*. JMLR.Google Scholar - Zhang, T., Tao, D., Li, X., & Yang, J. (2009). Patch alignment for dimensionality reduction.
*IEEE Transactions on Knowledge and Data Engineering*,*21*(9), 1299–1313.CrossRefGoogle Scholar - Zhang, Z., & Zha, H. (2004). Principal manifolds and nonlinear dimensionality reduction via tangent space alignment.
*SIAM Journal on Scientific Computing*,*26*(1), 313–338.MATHMathSciNetCrossRefGoogle Scholar - Zhou, T., Tao, D., & Wu, X. (2011). Manifold elastic net: a unified framework for sparse dimension reduction.
*Data Mining and Knowledge Discovery*,*22*(3), 340–371.MATHMathSciNetCrossRefGoogle Scholar