Comparing dimensionality reduction techniques for visual analysis of the LSTM hidden activity on multi-dimensional time series modeling

Ji, Lianen; Qiu, Shirong; Xu, Zhi; Liu, Yue; Yang, Guang

doi:10.1007/s00371-023-03235-9

Comparing dimensionality reduction techniques for visual analysis of the LSTM hidden activity on multi-dimensional time series modeling

Original article
Published: 22 January 2024

(2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Lianen Ji ORCID: orcid.org/0000-0003-0503-0009¹,
Shirong Qiu¹,
Zhi Xu¹,
Yue Liu¹ &
…
Guang Yang¹

97 Accesses
1 Altmetric
Explore all metrics

Abstract

Long short-term memory (LSTM) network is widely applied to multi-dimensional time series modeling to solve many real-world problems, and visual analytics plays a crucial role in improving its interpretability. To understand the high-dimensional activations in the hidden layer of the model, the application of dimensionality reduction (DR) techniques is essential. However, the diversity of DR techniques dramatically increases the difficulty of selecting one among them. In this paper, aiming at the applicability of DR techniques for visual analysis of LSTM hidden activity on multi-dimensional time series modeling, we select four representative DR techniques as the comparison objects, including principal component analysis (PCA), multi-dimensional scaling (MDS), t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP). The original continuous modeling data and the symbolically processed discrete data are used as knowledge of model learning, which are associated with LSTM hidden layer activity, and the ability of DR techniques to maintain high-dimensional information of the hidden layer activation is compared. According to the model structure of LSTM and the characteristics of modeling data, the controlled experiments were carried out in five typical tasks, namely the quality evaluation of DR, the abstract representation of high and low hidden layers, the association analysis between model and output variable, the importance analysis of input features and the exploration of temporal regularity. Through the complete experimental process and detailed result analysis, we distilled a systematic guidance for analysts to select appropriate and effective DR techniques for visual analytics of LSTM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Algorithm 1

Linear Dimensionality Reduction for Time Series

Dimensionality Reduction for Visualization of Time Series and Trajectories

Extreme-SAX: Extreme Points Based Symbolic Representation for Time Series Classification

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Notes

The ETT dataset was acquired at https://paperswithcode.com/dataset/ett

References

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Gunning, D., Aha, D.: Darpa’s explainable artificial intelligence (Xai) program. AI Mag. 40(2), 44–58 (2019). https://doi.org/10.1609/aimag.v40i2.2850
Article Google Scholar
Lipton, Z.C., Berkowitz, J., Elkan, C.: A critical review of recurrent neural networks for sequence learning. arXiv:1506.00019 (2015)
Chu, Y., Fei, J., Hou, S.: Adaptive global sliding-mode control for dynamic systems using double hidden layer recurrent neural network structure. IEEE Trans. Neural Netw. Learn. Syst. (2020). https://doi.org/10.1109/TNNLS.2019.2919676
Article MathSciNet Google Scholar
Bäuerle, A., Albus, P., Störk, R., Seufert, T., Ropinski, T.: Explornn: teaching recurrent neural networks through visual exploration. Visual Comput. (2023). https://doi.org/10.1007/s00371-022-02593-0
Article Google Scholar
Liu, S., Maljovec, D., Wang, B., Bremer, P.T., Pascucci, V.: Visualizing high-dimensional data: advances in the past decade. IEEE Trans. Visual Comput. Graphics 23(3), 1249–1268 (2017). https://doi.org/10.1109/TVCG.2016.2640960
Article Google Scholar
Ali, M., Jones, M.W., Xie, X., Williams, M.: Timecluster: dimension reduction applied to temporal data for visual analytics. Vis. Comput. 35(6–8), 1013–1026 (2019). https://doi.org/10.1007/s00371-019-01673-y
Article Google Scholar
Ballester-Ripoll, R., Halter, G., Pajarola, R.: High-dimensional scalar function visualization using principal parameterizations. Visual Comput. (2023). https://doi.org/10.1007/s00371-023-02937-4
Article Google Scholar
La Rosa, B., Blasilli, G., Bourqui, R., Auber, D., Santucci, G., Capobianco, R., Bertini, E., Giot, R., Angelini, M.: State of the art of visual analytics for explainable deep learning. In: Pierre, A., Helwig, H. (eds.) Computer graphics forum, vol. 42, pp. 319–355. Wiley, London (2023)
Google Scholar
Zhao, Y., Luo, F., Chen, M., Wang, Y., Xia, J., Zhou, F., Wang, Y., Chen, Y., Chen, W.: Evaluating multi-dimensional visualizations for understanding fuzzy clusters. IEEE Trans. Visual Comput. Graphics 25(1), 12–21 (2019). https://doi.org/10.1109/TVCG.2018.2865020
Article Google Scholar
Strobelt, H., Gehrmann, S., Pfister, H., Rush, A.M.: Lstmvis: a tool for visual analysis of hidden state dynamics in recurrent neural networks. IEEE Trans. Visual Comput. Graphics 24(1), 667–676 (2018). https://doi.org/10.1109/TVCG.2017.2744158
Article Google Scholar
Hohman, F., Kahng, M., Pienta, R., Chau, D.H.: Visual analytics in deep learning: an interrogative survey for the next frontiers. IEEE Trans. Visual Comput. Graphics 25(8), 2674–2693 (2019). https://doi.org/10.1109/TVCG.2018.2843369
Article Google Scholar
Alicioglu, G., Sun, B.: A survey of visual analytics for explainable artificial intelligence methods. Comput. Graph. 102, 502–520 (2022)
Article Google Scholar
Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemom. Intell. Lab. Syst. 2(1), 37–52 (1987). https://doi.org/10.1016/0169-7439(87)80084-9
Article Google Scholar
Cox, M.A.A., Cox, T.F.: Multidimensional scaling, pp. 315–347. Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-33037-0_14
Book Google Scholar
Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)
Google Scholar
McInnes, L., Healy, J., Melville, J.: Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018). https://doi.org/10.48550/arXiv.1802.03426
Van der Maaten, L., Postma, E., Herik, H.: Dimensionality reduction: a comparative review. J. Mach. Learn. Res. 10, 66–71 (2007)
Google Scholar
Jia, W., Sun, M., Lian, J., Hou, S.: Feature dimensionality reduction: a review. Complex Intell. Syst. 8(3), 2663–2693 (2022). https://doi.org/10.1007/s40747-021-00637-x
Article Google Scholar
De Lorenzo, A., Medvet, E., Tušar, T., Bartoli, A.: An analysis of dimensionality reduction techniques for visualizing evolution. In: Proceedings of the genetic and evolutionary computation conference companion, gecco ’19, p. 1864-1872. association for computing machinery, New York, NY, USA (2019). https://doi.org/10.1145/3319619.3326868
Xia, J., Zhang, Y., Song, J., Chen, Y., Wang, Y., Liu, S.: Revisiting dimensionality reduction techniques for visual cluster analysis: an empirical study. IEEE Trans. Visual Comput. Graphics 28(1), 529–539 (2022). https://doi.org/10.1109/TVCG.2021.3114694
Article Google Scholar
Ayesha, S., Hanif, M.K., Talib, R.: Overview and comparative study of dimensionality reduction techniques for high dimensional data. Inform. Fusion 59, 44–58 (2020). https://doi.org/10.1016/j.inffus.2020.01.005
Article Google Scholar
Armstrong, G., Rahman, G., Martino, C., McDonald, D., Gonzalez, A., Mishne, G., Knight, R.: Applications and comparison of dimensionality reduction methods for microbiome data. Front. Bioinform. (2022). https://doi.org/10.3389/fbinf.2022.821861
Article Google Scholar
Jain, R., Kumar, A., Nayyar, A., Dewan, K., Garg, R., Raman, S., Ganguly, S.: Explaining sentiment analysis results on social media texts through visualization. Multimed. Tools Appl. 82(15), 22613–22629 (2023). https://doi.org/10.1007/s11042-023-14432-y
Article Google Scholar
Holzinger, A.: The next frontier: ai we can really trust. Proc. ECML PKDD 2021, 427–440 (2021). https://doi.org/10.1007/978-3-030-93736-2_33
Article Google Scholar
Holzinger, A., Dehmer, M., Emmert-Streib, F., Cucchiara, R., Augenstein, I., Del Ser, J., Samek, W., Jurisica, I., Díaz-Rodríguez, N.: Information fusion as an integrative cross-cutting enabler to achieve robust, explainable, and trustworthy medical artificial intelligence. Information Fusion 79, 263–278 (2022). https://doi.org/10.1016/j.inffus.2021.10.007
Article Google Scholar
Choo, J., Liu, S.: Visual analytics for explainable deep learning. IEEE Comput. Graphics Appl. 38(4), 84–92 (2018). https://doi.org/10.1109/MCG.2018.042731661
Article Google Scholar
Ras, G., Xie, N., Van Gerven, M., Doran, D.: Explainable deep learning: a field guide for the uninitiated. J. Art. Intell. Res. 73, 329–396 (2022)
MathSciNet Google Scholar
Zahavy, T., Ben-Zrihem, N., Mannor, S.: Graying the black box: Understanding dqns. In: International conference on machine learning, pp. 1899–1908. PMLR (2016). http://proceedings.mlr.press/v48/zahavy16.html
Gabella, M., Afambo, N., Ebli, S., Spreemann, G.: Topology of learning in artificial neural networks (2019). https://doi.org/10.48550/arXiv.1902.08160
Rauber, P.E., Fadel, S.G., Falcão, A.X., Telea, A.C.: Visualizing the hidden activity of artificial neural networks. IEEE Trans. Visual Comput. Graph. 23(1), 101–110 (2017). https://doi.org/10.1109/TVCG.2016.2598838
Article Google Scholar
Tang, Z., Shi, Y., Wang, D., Feng, Y., Zhang, S.: Memory visualization for gated recurrent neural networks in speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2736–2740 (2017). https://doi.org/10.1109/ICASSP.2017.7952654
Shen, Q., Wu, Y., Jiang, Y., Zeng, W., LAU, A.K.H., Vianova, A., Qu, H.: Visual interpretation of recurrent neural network on multi-dimensional time-series forecast. In: 2020 IEEE Pacific visualization symposium (PacificVis), pp. 61–70 (2020). https://doi.org/10.1109/PacificVis48177.2020.2785
Ji, L., Yang, Y., Qiu, S., et al.: Visual analytics of rnn for thermal power control system identification. J. Comput. Aided Design Comput. Graph. 33(12), 1876–1886 (2021)
Article Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)
Espadoto, M., Martins, R.M., Kerren, A., Hirata, N.S., Telea, A.C.: Toward a quantitative survey of dimension reduction techniques. IEEE Trans. Visual Comput. Graph. 27(3), 2153–2173 (2019). https://doi.org/10.1109/TVCG.2019.2944182
Article Google Scholar
Martins, R.M., Coimbra, D.B., Minghim, R., Telea, A.: Visual analysis of dimensionality reduction quality for parameterized projections. Comput. Graph. 41, 26–42 (2014). https://doi.org/10.1016/j.cag.2014.01.006
Article Google Scholar
Gracia, A., González, S., Robles, V., Menasalvas, E.: A methodology to compare dimensionality reduction algorithms in terms of loss of quality. Inform. Sci. 270, 1–27 (2014). https://doi.org/10.1016/j.ins.2014.02.068
Article MathSciNet Google Scholar
Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing sax: a novel symbolic representation of time series. Data Min. Knowl. Disc. 15(2), 107–144 (2007). https://doi.org/10.1007/s10618-007-0064-z
Article MathSciNet Google Scholar
Karo, I.M.K., MaulanaAdhinugraha, K., Huda, A.F.: A cluster validity for spatial clustering based on davies bouldin index and polygon dissimilarity function. In: 2017 Second International Conference on Informatics and Computing (ICIC), pp. 1–6 (2017). https://doi.org/10.1109/IAC.2017.8280572
Natsukawa, H., Deyle, E.R., Pao, G.M., Koyamada, K., Sugihara, G.: A visual analytics approach for ecosystem dynamics based on empirical dynamic modeling. IEEE Trans. Visual Comput. Graph. 27(2), 506–516 (2021). https://doi.org/10.1109/TVCG.2020.3028956
Article Google Scholar
Kindlmann, G., Scheidegger, C.: An algebraic process for visualization design. IEEE Trans. Visual Comput. Graph. 20(12), 2181–2190 (2014). https://doi.org/10.1109/TVCG.2014.2346325
Article Google Scholar
Paulovich, F.V., Nonato, L.G., Minghim, R., Levkowitz, H.: Least square projection: a fast high-precision multidimensional projection technique and its application to document mapping. IEEE Trans. Visual Comput. Graph. 14(3), 564–575 (2008). https://doi.org/10.1109/TVCG.2007.70443
Article Google Scholar
Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)
Article Google Scholar
Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., Zhang, W.: Informer: Beyond efficient transformer for long sequence time-series forecasting. In: Proceedings of the AAAI conference on artificial intelligence, vol. 35, pp. 11106–11115 (2021)

Download references

Acknowledgements

This work was supported by the NSFC under Grant No. 60873093 and the Strategic Cooperation Technology Projects of CNPC and CUPB (ZLZX2020-05).

Author information

Authors and Affiliations

Beijing Key Laboratory of Petroleum Data Mining, Department of Computer Science, China University of Petroleum, Beijing, China
Lianen Ji, Shirong Qiu, Zhi Xu, Yue Liu & Guang Yang

Authors

Lianen Ji
View author publications
You can also search for this author in PubMed Google Scholar
Shirong Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yue Liu
View author publications
You can also search for this author in PubMed Google Scholar
Guang Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lianen Ji.

Ethics declarations

Conflict of interest

Lianen Ji declares that he has no conflict of interest. Shirong Qiu declares that he has no conflict of interest. Zhi Xu declares that he has no conflict of interest. Yue Liu declares that she has no conflict of interest. Guang Yang declares that he has no conflict of interest.

Ethical approval

This work is original research that has not been published before and is not considered for publication elsewhere.

Humans or animal rights

This article does not include any studies of humans or animals.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Model description

In this section, we introduce the working mechanism of the simple recurrent neural network (SRN) [44] and the basic long short-term memory (LSTM).

The SRN represents the most basic form of recurrent neural networks (RNNs) and serves as a fundamental building block for more advanced variants. In the forward propagation process, the hidden state at t time step $h_t$ is calculated by the hidden state at $t-1$ time step $h_{t-1}$ the input at t time step $x_t$ and learning parameters (U, W, b), as follows:

$$\begin{aligned} h_t=\sigma (Ux_t+Wh_{t-1}+b). \end{aligned}$$

(6)

When the time step length is too long, SRN will suffer from the vanishing or exploding gradient issue, which makes learning SRN using gradient descent very difficult. Hochreiter and Schmidhuber [1] proposed LSTM to solve this problem, enabling RNNs to be applied to long sequence time series. There are three gate controllers on each LSTM hidden layer neuron, namely input gate $i_t$, forget gate $f_t$, and output gate $o_t$. The core component of LSTM is the memory cell, which possesses an internal state, referred to as the cell state $c_t$, for storing and transmitting information. The input gate controls the input of information, the forget gate resets the retention of historical state information of hidden units, and the output gate controls the output of information. The process can be expressed as

$$\begin{aligned} \left\{ \begin{array}{l} i_t=\sigma (U_\textrm{i}x_t+W_\textrm{i}h_{t-1}+V_\textrm{i}c_{t-1}+b_\textrm{i}) \\ \\ f_t=\sigma (U_\textrm{f}x_t+W_\textrm{f}h_{t-1}+V_\textrm{f}c_{t-1}+b_\textrm{f}) \\ \\ c_t=f_\textrm{t}\cdot c_{t-1}+i_\textrm{t}\cdot \mathrm{{tan}}h(U_\textrm{c}x_t+W_\textrm{c}h_{t-1}+b_\textrm{c}) \\ \\ o_t=\sigma (U_\textrm{o}x_t+W_\textrm{o}h_{t-1}+V_\textrm{o}c_{t-1}+b_\textrm{o}) \\ \\ h_t=o_t\cdot tanh(c_t) \end{array}\right. \end{aligned}$$

(7)

where W and U are weights, and b is bias. h is the activation value of the hidden layer output.

Appendix B: Dataset description

The electricity transformer temperature (ETT) is a crucial indicator in the electric power long-term deployment. The ETT dataset introduced by Zhou et al. [45] is a popular dataset for time series forecast (TSF) tasks, which is publicly available and can be acquired at https://paperswithcode. com/dataset/ett. This dataset consists of 2 years data from two separated counties in China and the sampling interval is 1 min. Each data point consists of the target value “oil temperature” and 6 power load features. Figure 10 shows the temporal fluctuation curves of the normalized predicted variables in the ETT dataset and HST dataset. A total of 18 000 sets of data are taken as the training dataset, 1 000 sets as the validation dataset, and 1 000 sets as the test dataset. We selected this dataset for new multi-dimensional time series modeling, aiming to compare the applicability of dimensionality reduction (DR) techniques in various analysis tasks.

Appendix C: Additional case study with the ETT dataset and the basic LSTM

Here, we use the basic LSTM for ETT prediction and extract the activations of the hidden layers. The selected LSTM architecture consists of two hidden layers, each comprising 100 neurons. Similarly, we conducted comparative experiments on DR techniques in the previously proposed visual analysis tasks. All projections presented were created from activations of a test set subset and inspecting a training set subset can also provide similar insights.

Table 5 Statistical results of feature importance

Full size table

The abstract representation of the high and low hidden layers. As shown in Fig. 11, t-SNE and UMAP projecting the two hidden layers is not conducive to identifying the abstract ability of the hidden layers, and the difference in the activation shape of the projection view of the two hidden layers is small. However, after PCA processing, the projection points of the low hidden layer show a uniformly distributed circular shape, which cannot separate different types of samples well. While the projection points of the high hidden layer show a shape distribution with obvious angles and edges, showing more complex and nonlinear inter-layer abstract representations. Unlike previous experiments, we do not observe clustered projection scatters of PCA and MDS in the low hidden layer, possibly due to the time series fluctuation of the predicted variable values of the ETT dataset being more intense than that of the HST dataset, as shown in Fig. 10.

The association analysis between model and output variable. As shown in Fig. 11, when the associated object of activations is continuous, PCA and MDS produce smoother and more continuous scatter distributions than other DR techniques. During the training process, the spatial projection views of the four DR techniques can reflect the separated projection results of the sample points gradually become better. However, only the projection transformation of PCA matches the magnitude of actual activation changes, which also reflects the advantages of the linear DR techniques. Additionally, we use the same method to obtain discrete symbol datasets and project various classes of samples. The results depicted in Fig. 12 demonstrate that t-SNE and UMAP yield superior separation effects in this context.

The importance analysis of input features. As shown in Table 6, we calculate the correlation coefficients between input and predicted variables. $X_2$ has the strongest correlation with the predicted parameter, followed by $X_4$ or $X_6$. Table 5 shows the statistical results of the feature importance by DR techniques with different parameter settings. We find that the stronger the correlation between the input variables and the predicted variable, the greater their feature importance, especially in DR techniques with a strong ability to preserve the global neighborhood. Therefore, consistent with the previous experiments, PCA and MDS perform better in this task.

Table 6 Statistical results of the correlation coefficient between the input variables and the predicted variable

Full size table

The exploration of temporal regularity. After observing the temporal projection views of numerous samples, we noticed that PCA and MDS exhibit more low-quality inflection points, making their projection trajectories less smooth compared to t-SNE and UMAP. Figure 13 also confirms that the appearance of inflection points is associated with the local neighborhood preservation ability of these techniques, which highlights the advantage of t-SNE and UMAP in this task.

Since we selected a new dataset and model structure in the experiment, the activation pattern characteristics of the hidden layers we observed have also changed to a certain extent. Nevertheless, the DR techniques still show similar performance to previous experiments in LSTM visual analysis tasks, which reinforces the generalizability of the guidelines proposed in this paper.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ji, L., Qiu, S., Xu, Z. et al. Comparing dimensionality reduction techniques for visual analysis of the LSTM hidden activity on multi-dimensional time series modeling. Vis Comput (2024). https://doi.org/10.1007/s00371-023-03235-9

Download citation

Accepted: 14 December 2023
Published: 22 January 2024
DOI: https://doi.org/10.1007/s00371-023-03235-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparing dimensionality reduction techniques for visual analysis of the LSTM hidden activity on multi-dimensional time series modeling

Abstract

Access this article

Similar content being viewed by others

Linear Dimensionality Reduction for Time Series

Dimensionality Reduction for Visualization of Time Series and Trajectories

Extreme-SAX: Extreme Points Based Symbolic Representation for Time Series Classification

Data availability

Notes

References

Acknowledgements