Skip to main content
Log in

Comparing dimensionality reduction techniques for visual analysis of the LSTM hidden activity on multi-dimensional time series modeling

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Long short-term memory (LSTM) network is widely applied to multi-dimensional time series modeling to solve many real-world problems, and visual analytics plays a crucial role in improving its interpretability. To understand the high-dimensional activations in the hidden layer of the model, the application of dimensionality reduction (DR) techniques is essential. However, the diversity of DR techniques dramatically increases the difficulty of selecting one among them. In this paper, aiming at the applicability of DR techniques for visual analysis of LSTM hidden activity on multi-dimensional time series modeling, we select four representative DR techniques as the comparison objects, including principal component analysis (PCA), multi-dimensional scaling (MDS), t-distributed stochastic neighbor embedding (t-SNE), and uniform manifold approximation and projection (UMAP). The original continuous modeling data and the symbolically processed discrete data are used as knowledge of model learning, which are associated with LSTM hidden layer activity, and the ability of DR techniques to maintain high-dimensional information of the hidden layer activation is compared. According to the model structure of LSTM and the characteristics of modeling data, the controlled experiments were carried out in five typical tasks, namely the quality evaluation of DR, the abstract representation of high and low hidden layers, the association analysis between model and output variable, the importance analysis of input features and the exploration of temporal regularity. Through the complete experimental process and detailed result analysis, we distilled a systematic guidance for analysts to select appropriate and effective DR techniques for visual analytics of LSTM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Notes

  1. The ETT dataset was acquired at https://paperswithcode.com/dataset/ett

References

  1. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  2. Gunning, D., Aha, D.: Darpa’s explainable artificial intelligence (Xai) program. AI Mag. 40(2), 44–58 (2019). https://doi.org/10.1609/aimag.v40i2.2850

    Article  Google Scholar 

  3. Lipton, Z.C., Berkowitz, J., Elkan, C.: A critical review of recurrent neural networks for sequence learning. arXiv:1506.00019 (2015)

  4. Chu, Y., Fei, J., Hou, S.: Adaptive global sliding-mode control for dynamic systems using double hidden layer recurrent neural network structure. IEEE Trans. Neural Netw. Learn. Syst. (2020). https://doi.org/10.1109/TNNLS.2019.2919676

    Article  MathSciNet  Google Scholar 

  5. Bäuerle, A., Albus, P., Störk, R., Seufert, T., Ropinski, T.: Explornn: teaching recurrent neural networks through visual exploration. Visual Comput. (2023). https://doi.org/10.1007/s00371-022-02593-0

    Article  Google Scholar 

  6. Liu, S., Maljovec, D., Wang, B., Bremer, P.T., Pascucci, V.: Visualizing high-dimensional data: advances in the past decade. IEEE Trans. Visual Comput. Graphics 23(3), 1249–1268 (2017). https://doi.org/10.1109/TVCG.2016.2640960

    Article  Google Scholar 

  7. Ali, M., Jones, M.W., Xie, X., Williams, M.: Timecluster: dimension reduction applied to temporal data for visual analytics. Vis. Comput. 35(6–8), 1013–1026 (2019). https://doi.org/10.1007/s00371-019-01673-y

    Article  Google Scholar 

  8. Ballester-Ripoll, R., Halter, G., Pajarola, R.: High-dimensional scalar function visualization using principal parameterizations. Visual Comput. (2023). https://doi.org/10.1007/s00371-023-02937-4

    Article  Google Scholar 

  9. La Rosa, B., Blasilli, G., Bourqui, R., Auber, D., Santucci, G., Capobianco, R., Bertini, E., Giot, R., Angelini, M.: State of the art of visual analytics for explainable deep learning. In: Pierre, A., Helwig, H. (eds.) Computer graphics forum, vol. 42, pp. 319–355. Wiley, London (2023)

    Google Scholar 

  10. Zhao, Y., Luo, F., Chen, M., Wang, Y., Xia, J., Zhou, F., Wang, Y., Chen, Y., Chen, W.: Evaluating multi-dimensional visualizations for understanding fuzzy clusters. IEEE Trans. Visual Comput. Graphics 25(1), 12–21 (2019). https://doi.org/10.1109/TVCG.2018.2865020

    Article  Google Scholar 

  11. Strobelt, H., Gehrmann, S., Pfister, H., Rush, A.M.: Lstmvis: a tool for visual analysis of hidden state dynamics in recurrent neural networks. IEEE Trans. Visual Comput. Graphics 24(1), 667–676 (2018). https://doi.org/10.1109/TVCG.2017.2744158

    Article  Google Scholar 

  12. Hohman, F., Kahng, M., Pienta, R., Chau, D.H.: Visual analytics in deep learning: an interrogative survey for the next frontiers. IEEE Trans. Visual Comput. Graphics 25(8), 2674–2693 (2019). https://doi.org/10.1109/TVCG.2018.2843369

    Article  Google Scholar 

  13. Alicioglu, G., Sun, B.: A survey of visual analytics for explainable artificial intelligence methods. Comput. Graph. 102, 502–520 (2022)

    Article  Google Scholar 

  14. Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemom. Intell. Lab. Syst. 2(1), 37–52 (1987). https://doi.org/10.1016/0169-7439(87)80084-9

    Article  Google Scholar 

  15. Cox, M.A.A., Cox, T.F.: Multidimensional scaling, pp. 315–347. Springer, Berlin, Heidelberg (2008). https://doi.org/10.1007/978-3-540-33037-0_14

    Book  Google Scholar 

  16. Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)

    Google Scholar 

  17. McInnes, L., Healy, J., Melville, J.: Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018). https://doi.org/10.48550/arXiv.1802.03426

  18. Van der Maaten, L., Postma, E., Herik, H.: Dimensionality reduction: a comparative review. J. Mach. Learn. Res. 10, 66–71 (2007)

    Google Scholar 

  19. Jia, W., Sun, M., Lian, J., Hou, S.: Feature dimensionality reduction: a review. Complex Intell. Syst. 8(3), 2663–2693 (2022). https://doi.org/10.1007/s40747-021-00637-x

    Article  Google Scholar 

  20. De Lorenzo, A., Medvet, E., Tušar, T., Bartoli, A.: An analysis of dimensionality reduction techniques for visualizing evolution. In: Proceedings of the genetic and evolutionary computation conference companion, gecco ’19, p. 1864-1872. association for computing machinery, New York, NY, USA (2019). https://doi.org/10.1145/3319619.3326868

  21. Xia, J., Zhang, Y., Song, J., Chen, Y., Wang, Y., Liu, S.: Revisiting dimensionality reduction techniques for visual cluster analysis: an empirical study. IEEE Trans. Visual Comput. Graphics 28(1), 529–539 (2022). https://doi.org/10.1109/TVCG.2021.3114694

    Article  Google Scholar 

  22. Ayesha, S., Hanif, M.K., Talib, R.: Overview and comparative study of dimensionality reduction techniques for high dimensional data. Inform. Fusion 59, 44–58 (2020). https://doi.org/10.1016/j.inffus.2020.01.005

    Article  Google Scholar 

  23. Armstrong, G., Rahman, G., Martino, C., McDonald, D., Gonzalez, A., Mishne, G., Knight, R.: Applications and comparison of dimensionality reduction methods for microbiome data. Front. Bioinform. (2022). https://doi.org/10.3389/fbinf.2022.821861

    Article  Google Scholar 

  24. Jain, R., Kumar, A., Nayyar, A., Dewan, K., Garg, R., Raman, S., Ganguly, S.: Explaining sentiment analysis results on social media texts through visualization. Multimed. Tools Appl. 82(15), 22613–22629 (2023). https://doi.org/10.1007/s11042-023-14432-y

    Article  Google Scholar 

  25. Holzinger, A.: The next frontier: ai we can really trust. Proc. ECML PKDD 2021, 427–440 (2021). https://doi.org/10.1007/978-3-030-93736-2_33

    Article  Google Scholar 

  26. Holzinger, A., Dehmer, M., Emmert-Streib, F., Cucchiara, R., Augenstein, I., Del Ser, J., Samek, W., Jurisica, I., Díaz-Rodríguez, N.: Information fusion as an integrative cross-cutting enabler to achieve robust, explainable, and trustworthy medical artificial intelligence. Information Fusion 79, 263–278 (2022). https://doi.org/10.1016/j.inffus.2021.10.007

    Article  Google Scholar 

  27. Choo, J., Liu, S.: Visual analytics for explainable deep learning. IEEE Comput. Graphics Appl. 38(4), 84–92 (2018). https://doi.org/10.1109/MCG.2018.042731661

    Article  Google Scholar 

  28. Ras, G., Xie, N., Van Gerven, M., Doran, D.: Explainable deep learning: a field guide for the uninitiated. J. Art. Intell. Res. 73, 329–396 (2022)

    MathSciNet  Google Scholar 

  29. Zahavy, T., Ben-Zrihem, N., Mannor, S.: Graying the black box: Understanding dqns. In: International conference on machine learning, pp. 1899–1908. PMLR (2016). http://proceedings.mlr.press/v48/zahavy16.html

  30. Gabella, M., Afambo, N., Ebli, S., Spreemann, G.: Topology of learning in artificial neural networks (2019). https://doi.org/10.48550/arXiv.1902.08160

  31. Rauber, P.E., Fadel, S.G., Falcão, A.X., Telea, A.C.: Visualizing the hidden activity of artificial neural networks. IEEE Trans. Visual Comput. Graph. 23(1), 101–110 (2017). https://doi.org/10.1109/TVCG.2016.2598838

    Article  Google Scholar 

  32. Tang, Z., Shi, Y., Wang, D., Feng, Y., Zhang, S.: Memory visualization for gated recurrent neural networks in speech recognition. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2736–2740 (2017). https://doi.org/10.1109/ICASSP.2017.7952654

  33. Shen, Q., Wu, Y., Jiang, Y., Zeng, W., LAU, A.K.H., Vianova, A., Qu, H.: Visual interpretation of recurrent neural network on multi-dimensional time-series forecast. In: 2020 IEEE Pacific visualization symposium (PacificVis), pp. 61–70 (2020). https://doi.org/10.1109/PacificVis48177.2020.2785

  34. Ji, L., Yang, Y., Qiu, S., et al.: Visual analytics of rnn for thermal power control system identification. J. Comput. Aided Design Comput. Graph. 33(12), 1876–1886 (2021)

    Article  Google Scholar 

  35. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information processing systems 30 (2017)

  36. Espadoto, M., Martins, R.M., Kerren, A., Hirata, N.S., Telea, A.C.: Toward a quantitative survey of dimension reduction techniques. IEEE Trans. Visual Comput. Graph. 27(3), 2153–2173 (2019). https://doi.org/10.1109/TVCG.2019.2944182

    Article  Google Scholar 

  37. Martins, R.M., Coimbra, D.B., Minghim, R., Telea, A.: Visual analysis of dimensionality reduction quality for parameterized projections. Comput. Graph. 41, 26–42 (2014). https://doi.org/10.1016/j.cag.2014.01.006

    Article  Google Scholar 

  38. Gracia, A., González, S., Robles, V., Menasalvas, E.: A methodology to compare dimensionality reduction algorithms in terms of loss of quality. Inform. Sci. 270, 1–27 (2014). https://doi.org/10.1016/j.ins.2014.02.068

    Article  MathSciNet  Google Scholar 

  39. Lin, J., Keogh, E., Wei, L., Lonardi, S.: Experiencing sax: a novel symbolic representation of time series. Data Min. Knowl. Disc. 15(2), 107–144 (2007). https://doi.org/10.1007/s10618-007-0064-z

    Article  MathSciNet  Google Scholar 

  40. Karo, I.M.K., MaulanaAdhinugraha, K., Huda, A.F.: A cluster validity for spatial clustering based on davies bouldin index and polygon dissimilarity function. In: 2017 Second International Conference on Informatics and Computing (ICIC), pp. 1–6 (2017). https://doi.org/10.1109/IAC.2017.8280572

  41. Natsukawa, H., Deyle, E.R., Pao, G.M., Koyamada, K., Sugihara, G.: A visual analytics approach for ecosystem dynamics based on empirical dynamic modeling. IEEE Trans. Visual Comput. Graph. 27(2), 506–516 (2021). https://doi.org/10.1109/TVCG.2020.3028956

    Article  Google Scholar 

  42. Kindlmann, G., Scheidegger, C.: An algebraic process for visualization design. IEEE Trans. Visual Comput. Graph. 20(12), 2181–2190 (2014). https://doi.org/10.1109/TVCG.2014.2346325

    Article  Google Scholar 

  43. Paulovich, F.V., Nonato, L.G., Minghim, R., Levkowitz, H.: Least square projection: a fast high-precision multidimensional projection technique and its application to document mapping. IEEE Trans. Visual Comput. Graph. 14(3), 564–575 (2008). https://doi.org/10.1109/TVCG.2007.70443

    Article  Google Scholar 

  44. Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)

    Article  Google Scholar 

  45. Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., Zhang, W.: Informer: Beyond efficient transformer for long sequence time-series forecasting. In: Proceedings of the AAAI conference on artificial intelligence, vol. 35, pp. 11106–11115 (2021)

Download references

Acknowledgements

This work was supported by the NSFC under Grant No. 60873093 and the Strategic Cooperation Technology Projects of CNPC and CUPB (ZLZX2020-05).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lianen Ji.

Ethics declarations

Conflict of interest

Lianen Ji declares that he has no conflict of interest. Shirong Qiu declares that he has no conflict of interest. Zhi Xu declares that he has no conflict of interest. Yue Liu declares that she has no conflict of interest. Guang Yang declares that he has no conflict of interest.

Ethical approval

This work is original research that has not been published before and is not considered for publication elsewhere.

Humans or animal rights

This article does not include any studies of humans or animals.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Model description

In this section, we introduce the working mechanism of the simple recurrent neural network (SRN) [44] and the basic long short-term memory (LSTM).

The SRN represents the most basic form of recurrent neural networks (RNNs) and serves as a fundamental building block for more advanced variants. In the forward propagation process, the hidden state at t time step \(h_t\) is calculated by the hidden state at \(t-1\) time step \(h_{t-1}\) the input at t time step \(x_t\) and learning parameters (U, W, b), as follows:

$$\begin{aligned} h_t=\sigma (Ux_t+Wh_{t-1}+b). \end{aligned}$$
(6)

When the time step length is too long, SRN will suffer from the vanishing or exploding gradient issue, which makes learning SRN using gradient descent very difficult. Hochreiter and Schmidhuber [1] proposed LSTM to solve this problem, enabling RNNs to be applied to long sequence time series. There are three gate controllers on each LSTM hidden layer neuron, namely input gate \(i_t\), forget gate \(f_t\), and output gate \(o_t\). The core component of LSTM is the memory cell, which possesses an internal state, referred to as the cell state \(c_t\), for storing and transmitting information. The input gate controls the input of information, the forget gate resets the retention of historical state information of hidden units, and the output gate controls the output of information. The process can be expressed as

$$\begin{aligned} \left\{ \begin{array}{l} i_t=\sigma (U_\textrm{i}x_t+W_\textrm{i}h_{t-1}+V_\textrm{i}c_{t-1}+b_\textrm{i}) \\ \\ f_t=\sigma (U_\textrm{f}x_t+W_\textrm{f}h_{t-1}+V_\textrm{f}c_{t-1}+b_\textrm{f}) \\ \\ c_t=f_\textrm{t}\cdot c_{t-1}+i_\textrm{t}\cdot \mathrm{{tan}}h(U_\textrm{c}x_t+W_\textrm{c}h_{t-1}+b_\textrm{c}) \\ \\ o_t=\sigma (U_\textrm{o}x_t+W_\textrm{o}h_{t-1}+V_\textrm{o}c_{t-1}+b_\textrm{o}) \\ \\ h_t=o_t\cdot tanh(c_t) \end{array}\right. \end{aligned}$$
(7)

where W and U are weights, and b is bias. h is the activation value of the hidden layer output.

Appendix B: Dataset description

Fig. 10
figure 10

Temporal fluctuation curves of the normalized predicted variables in the ETT dataset and HST dataset

The electricity transformer temperature (ETT) is a crucial indicator in the electric power long-term deployment. The ETT dataset introduced by Zhou et al. [45] is a popular dataset for time series forecast (TSF) tasks, which is publicly available and can be acquired at https://paperswithcode. com/dataset/ett. This dataset consists of 2 years data from two separated counties in China and the sampling interval is 1 min. Each data point consists of the target value “oil temperature” and 6 power load features. Figure 10 shows the temporal fluctuation curves of the normalized predicted variables in the ETT dataset and HST dataset. A total of 18 000 sets of data are taken as the training dataset, 1 000 sets as the validation dataset, and 1 000 sets as the test dataset. We selected this dataset for new multi-dimensional time series modeling, aiming to compare the applicability of dimensionality reduction (DR) techniques in various analysis tasks.

Fig. 11
figure 11

Association changes between the model’s hidden layers and output variable

Appendix C: Additional case study with the ETT dataset and the basic LSTM

Here, we use the basic LSTM for ETT prediction and extract the activations of the hidden layers. The selected LSTM architecture consists of two hidden layers, each comprising 100 neurons. Similarly, we conducted comparative experiments on DR techniques in the previously proposed visual analysis tasks. All projections presented were created from activations of a test set subset and inspecting a training set subset can also provide similar insights.

Fig. 12
figure 12

Association between activation state and symbolic output variables (“A,” “C,” and “E”)

Table 5 Statistical results of feature importance

The abstract representation of the high and low hidden layers. As shown in Fig. 11, t-SNE and UMAP projecting the two hidden layers is not conducive to identifying the abstract ability of the hidden layers, and the difference in the activation shape of the projection view of the two hidden layers is small. However, after PCA processing, the projection points of the low hidden layer show a uniformly distributed circular shape, which cannot separate different types of samples well. While the projection points of the high hidden layer show a shape distribution with obvious angles and edges, showing more complex and nonlinear inter-layer abstract representations. Unlike previous experiments, we do not observe clustered projection scatters of PCA and MDS in the low hidden layer, possibly due to the time series fluctuation of the predicted variable values of the ETT dataset being more intense than that of the HST dataset, as shown in Fig. 10.

The association analysis between model and output variable. As shown in Fig. 11, when the associated object of activations is continuous, PCA and MDS produce smoother and more continuous scatter distributions than other DR techniques. During the training process, the spatial projection views of the four DR techniques can reflect the separated projection results of the sample points gradually become better. However, only the projection transformation of PCA matches the magnitude of actual activation changes, which also reflects the advantages of the linear DR techniques. Additionally, we use the same method to obtain discrete symbol datasets and project various classes of samples. The results depicted in Fig. 12 demonstrate that t-SNE and UMAP yield superior separation effects in this context.

The importance analysis of input features. As shown in Table 6, we calculate the correlation coefficients between input and predicted variables. \(X_2\) has the strongest correlation with the predicted parameter, followed by \(X_4\) or \(X_6\). Table 5 shows the statistical results of the feature importance by DR techniques with different parameter settings. We find that the stronger the correlation between the input variables and the predicted variable, the greater their feature importance, especially in DR techniques with a strong ability to preserve the global neighborhood. Therefore, consistent with the previous experiments, PCA and MDS perform better in this task.

Table 6 Statistical results of the correlation coefficient between the input variables and the predicted variable

The exploration of temporal regularity. After observing the temporal projection views of numerous samples, we noticed that PCA and MDS exhibit more low-quality inflection points, making their projection trajectories less smooth compared to t-SNE and UMAP. Figure 13 also confirms that the appearance of inflection points is associated with the local neighborhood preservation ability of these techniques, which highlights the advantage of t-SNE and UMAP in this task.

Fig. 13
figure 13

Comparison result of the deviation degree of nodes with different local errors in the temporal trajectory

Since we selected a new dataset and model structure in the experiment, the activation pattern characteristics of the hidden layers we observed have also changed to a certain extent. Nevertheless, the DR techniques still show similar performance to previous experiments in LSTM visual analysis tasks, which reinforces the generalizability of the guidelines proposed in this paper.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ji, L., Qiu, S., Xu, Z. et al. Comparing dimensionality reduction techniques for visual analysis of the LSTM hidden activity on multi-dimensional time series modeling. Vis Comput (2024). https://doi.org/10.1007/s00371-023-03235-9

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00371-023-03235-9

Keywords

Navigation