Introducing the Concept of Interaction Model for Interactive Dimensionality Reduction and Data Visualization
- 73 Downloads
This letter formally introduces the concept of interaction model (IM), which has been used either directly or tangentially in previous works but never defined. Broadly speaking, an IM consists of the use of a mixture of dimensionality reduction (DR) techniques within an interactive data visualization framework. The rationale of creating an IM is the need for simultaneously harnessing the benefit of several DR approaches to reach a data representation being intelligible and/or fitted to any user’s criterion. As a remarkable advantage, an IM naturally provides a generalized framework for designing both interactive DR approaches as well as readily-to-use data visualization interfaces. In addition to a comprehensive overview on basics of data representation and dimensionality reduction, the main contribution of this manuscript is the elegant definition of the concept of IM in mathematical terms.
KeywordsDimensionality reduction Interaction model Kernel functions Data visualization
Very often, dimensionality reduction (DR) is an essential building block to design both machine learning systems, and information visualization interfaces [1, 2]. In simple terms, DR consists of finding a low-dimensional representation of the original data (said to be high-dimensional) by keeping a criterion of either data structure preservation, or class-separability ensuring. Recent analysis has shown that DR should attempt to reach two goals: First, to ensure that data points that are neighbors in the original space should remain neighbors in the embedded space. Second, to guarantee that two data points should be shown as neighbors in the embedded space only if they are neighbors in the original space. In the context of information retrieval, these two goals can be seen as precision and recall measures, respectively. In spite of being clearly conflicting, the compromise between precision and recall denes the DR method performance. Furthermore, since DR methods are often developed under determined design parameters and pre-established optimization criterion, they still lack of properties such as user interaction and controllability. These properties are characteristic of information visualization procedures. The eld of data visualization (DataVis) is aimed at developing graphical ways of representing data so that information can be more usable and intelligible for the user . Then, one can intuit that DR can be improved by importing some properties of the DataVis methods. This is in fact the premise on which this research is based.
This emergent research area can be referred as interactive dimensionality reduction for visualization. Its main goal is to link the field of DR with that of DataVis, in order to harness the special properties of the latter within DR frameworks. In particular, the properties of controllability and interactivity are of great interest, which should make the DR outcomes significantly more understandable and tractable for the (no-necessarily-expert) user. These two properties allow the user to have freedom to select the best way for representing data. Then, in other words, it can be said that the goal of this project is to develop a DR framework that facilitates an interactive and quick visualization of data representation to make more intelligible the DR outcomes, as well as to allow users modifying the views of data according to their needs in an affordable fashion.
In this connection, this letter formally introduces the concept of interaction model (IM) as a key tool for both interactive DR and DataVis. Even though the term interaction model has been referred directly or tangentially in previous works [4, 5, 6, 7, 8], it has not been formally defined. This paper aims to fill that void. In general terms, the concept of IM refers to a mixture of dimensionality reduction (DR) techniques within an interactive data visualization framework. The rationale of creating an IM is the need for simultaneously harnessing the benefit of several DR approaches to reach a data representation being intelligible and/or fitted to any user’s criterion. As a remarkable advantage, an IM naturally provides a generalized framework for designing both interactive DR approaches as well as readily-to-use data visualization interfaces. That said, the main contribution of this manuscript is the elegant definition of the concept of IM in mathematical terms. Also, it overviews some basics of data representation and dimensionality reduction from matrix algebra point of view. A special interest is given to spectral and kernel-based DR methods, which can be generalized by kernel principal component analysis (KPCA) and readily incorporated into a linear IM.
The remaining of this manuscript is organized as follows: Sect. 2 states the mathematical notation for main variables and operators. Section 3 presents a short overview on basic concepts and introductory formulations for DR, with a special interest in KPCA. In Sect. 4, we formally define the concept of IM as well as its particular linear version. Also, the use of DR-based DataVis is outlined. Finally, some concluding remarks are gathered in Sect. 5.
2 Mathematical Notation
3 Overview on Dimensionality Reduction and Data Visualization
3.1 Data Representation
Data representation is a wide-meaning term coined by some authors to refer by-and-large to either data transformation or feature extraction. The former consists of transforming data into a new version intended to fulfill a specific goal . The latter, meanwhile, is somewhat as a data remaking in such a manner that input data undergo a morphological deformation, or projection (also called rotation) by following a certain transformation criterion . Also, data representation may be referred to yielding a new data representation -just as a new data matrix being an alternative to the original given one. For instance, a dissimilarity-based representation of input data . An exhaustive review on data representation is presented in .
3.2 Dimensionality Reduction
DR Approaches: For instance, pioneer approaches such as principal component analysis (PCA) or classical multidimensional scaling (CMDS) optimize the reduction in terms of variance and distance preservation criterion, respectively . More sophisticated methods attempt to capture the data topology through a non-directed and weighted data-driven graph, which is formed by nodes located at the geometrical coordinates pointed out by the data points represent the nodes, and a non-negative similarity (also called affinity, or Gram) matrix holding the pairwise edge weights. Such data-topology-based criteria has been addressed by both spectral  and divergence-based methods . Similarity matrix can represent either the weighting factor for pairwise distances as happens in Laplacian eigenmaps and locally linear embedding [17, 18], or a probability distribution as is the case of methods based on divergences such as stochastic neighbour embedding . In this letter, we give especial interest to the spectral approaches -more specifically, to the so-named kernel PCA.
3.3 DataVis via Interactive DR
Quite intuitively, one can infer that the premise underlying the use of DR for DataVis purposes is to making directly intelligible the information of a high-dimensional dataset by displaying it into a representation in 3 or less dimensions.
Besides, the incorporation of interactivity into the DR technique itself or DR-based DataVis interfaces enables the users (even non-expert ones) to select a method or tune parameters thereof in an intuitive fashion.
4 Concept of Interaction Model (IM) for DR
4.1 Definition of IM
Herein, the interaction is considered as the ability to incorporate in a readily manner the user’s criterion into the stages of the data exploration process. In this case, the DR is the stage of interest. Particularly, we refer to interactivity to the possibility of tuning parameters or selecting methods within an interactive interface. As traditionally done in previous works , the interactivity consists of a mixture of functions or elements representing DR techniques. In the following, we formally define the concept of IM:
4.2 Linear IM
As a particular case of the Definition 1, we can define the linear IM as follows:
4.3 DR-techniques Representation
As explained , spectral DR methods are susceptible to be represented as kernel matrices. Also, in  is demonstrated that, when incorporated into a KCPA algorithm, such kernel matrices reach the same low-dimensional spaces as those obtained by the original DR methods. Let us consider the following kernel representations for three well-known spectral DR approaches:
4.4 Use of IM in Interactive DR
5 Final Remarks
In this work, we have elegantly defined the concept of the so-named interaction model (IM). Such a definition open the possibility of developing more formally new interactive data visualization based on a mixture of dimensionality reduction techniques.
In future works, we will explore and/or develop novel kernel representations arising from other dimensionality reduction methods as well as IM approaches enabling users to readily incorporate their knowledge and expertise into data exploration and visualization.
The authors acknowledge to the research project “Desarrollo de una metodología de visualización interactiva y eficaz de información en Big Data” supported by Agreement No. 180 November 1st, 2016 by VIPRI from Universidad de Nariño.
As well, authors thank the valuable support given by the SDAS Research Group (www.sdas-group.com).
- 3.Ward, M.O., Grinstein, G., Keim, D.: Interactive Data Visualization: Foundations, Techniques, and Applications. CRC Press (2010)Google Scholar
- 4.Peluffo-Ordónez, D.H., Alvarado-Pérez, J.C., Lee, J.A., Verleysen, M., et al.: Geometrical homotopy for data visualization. In: European Symposium on Artificial Neural Networks (ESANN 2015). Computational Intelligence and Machine Learning (2015)Google Scholar
- 5.Salazar-Castro, J., Rosas-Narváez, Y., Pantoja, A., Alvarado-Pérez, J.C., Peluffo-Ordóñez, D.H.: Interactive interface for efficient data visualization via a geometric approach. In: 2015 20th Symposium on Signal Processing, Images and Computer Vision (STSIVA), pp. 1–6. IEEE (2015)Google Scholar
- 6.Rosero-Montalvo, P., et al.: Interactive data visualization using dimensionality reduction and similarity-based representations. In: Beltrán-Castañón, C., Nyström, I., Famili, F. (eds.) CIARP 2016. LNCS, vol. 10125, pp. 334–342. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52277-7_41CrossRefGoogle Scholar
- 7.Rosero-Montalvo, P.D., Peña-Unigarro, D.F., Peluffo, D.H., Castro-Silva, J.A., Umaquinga, A., Rosero-Rosero, E.A.: Data visualization using interactive dimensionality reduction and improved color-based interaction model. In: Ferrández Vicente, J.M., Álvarez-Sánchez, J.R., de la Paz López, F., Toledo Moreo, J., Adeli, H. (eds.) IWINAC 2017. LNCS, vol. 10338, pp. 289–298. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59773-7_30. (Cited by 8)CrossRefGoogle Scholar
- 8.Umaquinga-Criollo, A.C., Peluffo-Ordóñez, D.H., Rosero-Montalvo, P.D., Godoy-Trujillo, P.E., Benítez-Pereira, H.: Interactive visualization interfaces for big data analysis using combination of dimensionality reduction methods: a brief review. In: Basantes-Andrade, A., Naranjo-Toro, M., Zambrano Vizuete, M., Botto-Tobar, M. (eds.) TSIE 2019. AISC, vol. 1110, pp. 193–203. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37221-7_17CrossRefGoogle Scholar
- 10.Peluffo, D., Lee, J., Verleysen, M., Rodríguez-Sotelo, J., Castellanos-Domínguez, G.: Unsupervised relevance analysis for feature extraction and selection: a distance-based approach for feature relevance. In: International Conference on Pattern Recognition, Applications and Methods-ICPRAM (2014)Google Scholar
- 11.Cao, H., Bernard, S., Heutte, L., Sabourin, R.: Dissimilarity-based representation for radiomics applications. CoRR abs/1803.04460 (2018)Google Scholar
- 15.Peluffo-Ordóñez, D.H., Lee, J.A., Verleysen, M.: Generalized kernel framework for unsupervised spectral methods of dimensionality reduction. In: 2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 171–177. IEEE (2014)Google Scholar
- 16.Peluffo-Ordóñez, D.H., Lee, J.A., Verleysen, M.: Short review of dimensionality reduction methods based on stochastic neighbour embedding. In: Villmann, T., Schleif, F.-M., Kaden, M., Lange, M. (eds.) Advances in Self-Organizing Maps and Learning Vector Quantization. AISC, vol. 295, pp. 65–74. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07695-9_6CrossRefGoogle Scholar
- 18.Zhang, Z., Wang, J.: MLLE: modified locally linear embedding using multiple weights. In: Advances in Neural Information Processing Systems, pp. 1593–1600 (2007)Google Scholar
- 19.Hinton, G.E., Roweis, S.T.: Stochastic neighbor embedding. In: Advances in Neural Information Processing Systems, pp. 857–864 (2003)Google Scholar
- 20.Ham, J., Lee, D.D., Mika, S., Schölkopf, B.: A kernel view of the dimensionality reduction of manifolds. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 47. ACM (2004)Google Scholar