# Interactive Data Visualization Using Dimensionality Reduction and Similarity-Based Representations

- 6 Citations
- 1.2k Downloads

## Abstract

This work presents a new interactive data visualization approach based on mixture of the outcomes of dimensionality reduction (DR) methods. Such a mixture is a weighted sum, whose weighting factors are defined by the user through a visual and intuitive interface. Additionally, the low-dimensional representation space produced by DR methods are graphically depicted using scatter plots powered via an interactive data-driven visualization. To do so, pairwise similarities are calculated and employed to define the graph to be drawn on the scatter plot. Our visualization approach enables the user to interactively combine DR methods while provided information about the structure of original data, making then the selection of a DR scheme more intuitive.

## Keywords

Data visualization Dimensionality reduction Pairwise similarity## 1 Introduction

The aim of dimensionality reduction (DR) is to obtain lower dimensional representations of high-dimensional input data keeping -under a pre-established criterion- the structure of data as well as possible. Reaching this aim, entails both the performance of a pattern recognition system and intelligible data representation can be improved [1]. Traditionally, DR methods are designed by following pre-established optimization criteria and design parameters. But they mostly lack of properties like interactivity and controllability, being important characteristics of the field of Information Visualization (InfoVis) [2]. InfoVis provides interfaces and graphical ways of representing data making the available information more usable and intelligible for the user. However, it turns out that DR outcomes can be enhanced by taking advantages of some properties of InfoVis methods [3, 4]. Following this premise, some approaches have proposed [5, 6], making use of interactivity with equalizer-bar like interfaces or geometric interaction models. In general, such approaches implement interesting interactive models but their final visualization lacks the information about structure of the data from the original input space -at least in an easy to understand and/or visual way-.

In this work, we introduce a new visualization approach using an interactive mixture of data representations resultant from DR methods. After performing the DR methods on the input data, a set of lower-dimensional representation spaces are obtained. Particularly, the mixture is done via a weighted sum. In order to give users a sense of the structure of data, we implement a data-driven visualization in addition to the conventional scatter plot. Such a visualization captures the structure of the input data by using a similarity matrix (as well, affinity matrix from graph theory), which captures the degree of similarity or affinity between every pair of data points. The visualization consists of plotting lines (edges) between data points exhibiting the highest value of similarity. Additionally, to provide more sense of interactivity, user can control the number of edges by a varying parameter -working as a slider bar within an interface-. By design, affinity is selected as a Gaussian one so that the structure of local neighbor points can be taken into account. Particularly, low-dimensional spaces are obtained by the state of the art of methods such as: Classical Multidimensional Scaling (CMDS) [2], Laplacian Eigenmaps (LE) [7], Locally Linear Embedding (LLE) [8], Stochastic Neighbor Embedding (SNE), and t-Student-distributed-SNE (t-SNE) [1, 7]. To perform the mixture, user can set the weighting factors by picking up values from a equalizer-bar-like interface. To test our visualization approach, we use a 3D artificial spherical shell data set. The quality of resultant representation spaces is quantified by a scaled version of the average agreement rate between K-ary neighborhoods [9]. The proposed mixture may represent every single dimensionality reduction approach as well as it helps users to find a suitable representation of input data within a visual and friendly user interface.

The remaining of the paper is organized as follows: In Sect. 2, Data visualization via dimensionality reduction is outlined. Section 3 introduces the proposed interactive data visualization scheme. Experimental setup and results are presented in Sects. 4 and 5, respectively. Finally, Sect. 6 gathers some final remarks as conclusions and future work.

## 2 Data Visualization via Dimensionality Reduction

## 3 Interactive Data Visualization Scheme

### 3.1 Mixture

*M*different DR methods, yielding then a set of lower-dimensional representations: \(\{{\varvec{X}}^{(1)},\cdots ,{\varvec{X}}^{(M)}\}\). Herein, we propose to perform a weighted sum in the form:

### 3.2 Interaction Model

For the sake of interactivity, the values of every \(\alpha _{m}\), required to calculate \(\bar{{\varvec{X}}}\) according to Eq. (1), are to be defined by the users using an equalizer-bar available in the interface. Within a friendly-user and intuitive environment, weighting factors can be readily inputted by just picking up values from bars. In order to provide quick views of resultant representation space, as soon as a point is picked up the remaining ones are automatically completed following a uniform density probability function. The same is done in case than more than one value is selected.

### 3.3 Similarity-Based Visualization

The most used method to visualize 2- or 3-dimensional data is the scatter plot. In this work, we introduce a similarity-based visualization approach with the aim to provide a visual hint about the structure of the high-dimensional input data matrix \({\varvec{Y}}\) into the scatter plot of its representation in a lower-dimensional space To do so, we use a pairwise similarity matrix \({\varvec{S}} \in \mathbb {R}^{N\times N}\), such that \({\varvec{S}}=[{\varvec{s}}_{ij}]\). In terms of graph theory, entries \({\varvec{s}}_{ij}\) defines the similarity or affinity between the i-th and j-th data point from \({\varvec{Y}}\). Doing so, we can hold the structure of original input space in a topological fashion, specifically in terms of pairwise relationships. For visualization purposes, such a similarity is used to define graphically the relationship between data points by plotting edges. In order to control the amount of edges and make an appealing visual representations, the value of \({\varvec{s}}_{ij}\) is constrained as \({\varvec{s}}_{ij}>{\varvec{s}}_{max}\), being \({\varvec{s}}_{max}\) a maximum admissible similarity value to be given by the users as well. In other words, our visualization approach consists of building a graph with constrained affinity values.

## 4 Experimental Setup

**Database:** In order to visually evaluate the performance of the DataVisSim approach, we use an artificial spherical shell (N = 1500 data points and D = 3), as depicted in Fig. 1.

**Parameter Settings and Methods:** In order to capture the local structure for visualization, i.e. data points being neighbors, we utilize the Gaussian similarity given by: \({\varvec{s}}_{ij}=-exp(-0.5||{\varvec{y}}_{(i)}-{\varvec{y}}_{(j)}||^{2}/\sigma ^{2})\). The parameter is a bandwidth value set as 0.1, being the 10% of the hypersphere ratio (applicable once matrices are normalized as discussed in Sect. 3.1. To perform the dimensionality reduction we consider \(M = 5\) DR methods, namely: CMDS, LE, LLE, SNE, and t-SNE. All of them are intended to obtain spaces in dimension \(d=2\).

**Performance Measure:**To quantify the performance of studied methods, the scaled version of the average agreement rate \(R_{NX}(K)\) introduced in [9] is used, which is ranged within the interval [0, 1]. Since \(R_{NX}(K)\) is calculated at each perplexity value from 2 to \(N-1\), a numerical indicator of the overall performance can be obtained by calculating its area under the curve (AUC). The AUC assesses the dimension reduction quality at all scales, with the most appropriate weights.

## 5 Results and Discussion

To test the DataVis approach, we implement an interface on Processing software, which allows to easily code visual arts. Then, it results appealing for creating visual analytics interfaces. Figure 9 shows a view of the implemented interface. For the sake of easily handling so that (even non-expert) users may interact with DR methods and their feasible combinations in an intuitive manner using equalizer-like bars. This is possible because of resultant data representations are properly set according to the human perception. As well, the interface incorporates a slider bar to dynamically draw the edges between nodes. This is useful for visual analysis given that it allows to relate the structure of high-dimensional data (original data) within the visualization of the low-dimensional representation space. Therefore, it is provided a powerful tool for making decisions of the most suitable representation of the original data, in other words, the most proper DR methods.

## 6 Conclusions and Future Work

This work presents a new interactive data visualization approach based on mixture of the outcomes of dimensionality reduction (DR) methods. The core of this approach consists of plotting lines (edges) between data points exhibiting the highest value using a similarity matrix which measure the degree of similarity or affinity between every pair of data points capturing the structure of the input data. Such visualization of a topology can be represented by a data-driven graph in addition to the conventional scatter plot, to provide more sense of interactivity to the user for selecting and/or combining DR methods while providing information about the structure of original data. Correspondingly, data points represent the nodes and an affinity matrix holds the pairwise edge weights. As a future work, other dimensionality reduction methods are to be integrated into data-driven graph, so that a good trade between preservation of data structure and intelligible data visualization can be reached. More mathematical properties will be explored to design data-driven schemes that best approximate the topology data.

## References

- 1.Peluffo-Ordóñez, D.H., Lee, J.A., Verleysen, M.: Short review of dimensionality reduction methods based on stochastic neighbour embedding. In: Villmann, T., Schleif, F.-M., Kaden, M., Lange, M. (eds.) Advances in Self-Organizing Maps and Learning Vector Quantization. AISC, vol. 295, pp. 65–74. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-07695-9_6 CrossRefGoogle Scholar
- 2.Borg, I., Groenen, P.J.: Modern Multidimensional Scaling: Theory and Applications. Springer Science & Business Media, New York (2005)zbMATHGoogle Scholar
- 3.Dai, W., Hu, P.: Research on personalized behaviors recommendation system based on cloud computing. Indones. J. Electr. Eng. Comput. Sci.
**12**, 1480–1486 (2013)Google Scholar - 4.Ward, M.O., Grinstein, G., Keim, D.: Interactive Data Visualization: Foundations, Techniques, and Applications. CRC Press, Boca Raton (2010)zbMATHGoogle Scholar
- 5.Peluffo-Ordónez, D.H., Alvarado-Pérez, J.C., Lee, J.A., Verleysen, M., et al.: Geometrical homotopy for data visualization. In: European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2015) (2015)Google Scholar
- 6.Díaz, I., Cuadrado, A.A., Pérez, D., García, F.J., Verleysen, M.: Interactive dimensionality reduction for visual analytics. In: Proceedings of 22th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN 2014), pp. 183–188. Citeseer (2014)Google Scholar
- 7.Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput.
**15**, 1373–1396 (2003)CrossRefzbMATHGoogle Scholar - 8.Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science
**290**, 2323–2326 (2000)CrossRefGoogle Scholar - 9.Lee, J.A., Renard, E., Bernard, G., Dupont, P., Verleysen, M.: Type 1 and 2 mixtures of Kullback-Leibler divergences as cost functions in dimensionality reduction based on similarity preservation. Neurocomputing
**112**, 92–108 (2013)CrossRefGoogle Scholar - 10.Bertini, E., Lalanne, D.: Surveying the complementary role of automatic data analysis and visualization in knowledge discovery. In: Proceedings of ACM SIGKDD Workshop on Visual Analytics and Knowledge Discovery: Integrating Automated Analysis with Interactive Exploration, pp. 12–20. ACM (2009)Google Scholar
- 11.Peluffo-Ordóñez, D.H., Lee, J.A., Verleysen, M.: Generalized kernel framework for unsupervised spectral methods of dimensionality reduction. In: 2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 171–177. IEEE (2014)Google Scholar