Skip to main content
Log in

Dimension reduction and visualization of multiple time series data: a symbolic data analysis approach

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Exploratory analysis and visualization of multiple time series data are essential for discovering the underlying dynamics of a series before attempting modeling and forecasting. This study extends two dimension reduction methods - principal component analysis (PCA) and sliced inverse regression (SIR) - to multiple time series data. This is achieved through the innovative path point approach, a new addition to the symbolic data analysis framework. By transforming multiple time series data into time-dependent intervals marked by starting and ending values, each series is geometrically represented as successive directed segments with unique path points. These path points serve as the foundation of our novel representation approach. PCA and SIR are then applied to the data table formed by the coordinates of these path points, enabling visualization of temporal trajectories of objects within a reduced-dimensional subspace. Empirical studies encompassing simulations, microarray time series data from a yeast cell cycle, and financial data confirm the effectiveness of our path point approach in revealing the structure and behavior of objects within a 2D factorial plane. Comparative analyses with existing methods, such as the applied vector approach for PCA and SIR on time-dependent interval data, further underscore the strength and versatility of our path point representation in the realm of time series data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Aigner W, Miksch S, Müller W, Schumann H, Tominski C (2007) Visualizing time-oriented data—a systematic view. Comput Graph 31(3):401–409

    Google Scholar 

  • Bar-Joseph Z, Gitter A, Simon I (2012) Studying and modelling dynamic biological processes using time-series gene expression data. Nat Rev Genet 13(8):552–564

    Google Scholar 

  • Becker C, Fried R (2003) Sliced inverse regression for high-dimensional time series. In: Exploratory data analysis in empirical research: proceedings of the 25th annual conference of the gesellschaft fur klassifickation, University of Munich. pp 3 – 11

  • Bertrand P, Goupil F (2000) Descriptive statistics for symbolic data. In: Bock HH, Diday E (eds) Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. Springer-Verlag, Berlin, pp 103–124

    Google Scholar 

  • Billard L (2008) Sample covariance functions for complex quantitative data. In: Mizuta M. and Nakano J. (Ed): Proceedings of the international association of statistical computing conference, pp 157 – 163. Yokohama

  • Billard L, Diday E (2003) From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc 98(462):470–487

    MathSciNet  Google Scholar 

  • Billard L, Diday E (2006) Symbolic data analysis: conceptual statistics and data mining. Wiley

  • Bock HH, Diday E (2000) Analysis of symbolic data: explanatory methods for extracting statistical information from complex data. Springer-Verlag, Berlin

    Google Scholar 

  • Cazes P, Chouakria A, Diday E, Schecktman Y (1997) Extension de l’analyse en composantes principales ’a des donn’ees de type intervalle. Rev Stat Appl 45:5–24

    Google Scholar 

  • Chen CH, Li KC (1998) Can SIR be as popular as multiple linear regression? Stat Sinica 8:289–316

    MathSciNet  Google Scholar 

  • Cho RJ et al (1998) A genomewide transcriptional analysis of the mitotic cell cycle. Mol Cell 2(1):65–73

    MathSciNet  Google Scholar 

  • Chouakria A (1998) Extension de l’analyse en composantes principales ’a des donn’ees de type intervalle. Doctoral thesis; University of Paris IX Dauphine

  • Cook RD (1994) On the interpretation of regression plots. J Am Stat Assoc 89:177–190

    MathSciNet  Google Scholar 

  • Cook RD (1996) Graphics for regressions with a binary response. J Am Stat Assoc 91:983–992

    MathSciNet  Google Scholar 

  • Cook RD (2000) SAVE: a method for dimension reduction and graphics in regression. Commun Stat Theor Methods 29:2109–2121

    Google Scholar 

  • Cook RD, Critchley F (2000) Identifying regression outliers and mixtures graphically. J Am Stat Assoc 95:781–794

    MathSciNet  Google Scholar 

  • Cox TF, Cox MAA (2001) Multidimensional scaling. Chapman and Hall, London

    Google Scholar 

  • Diday E (2016) Thinking by classes in data science: the symbolic data analysis paradigm. WIREs Comput Stat 8:172–205

    MathSciNet  Google Scholar 

  • Douzal-Chouakria A, Billard L, Diday E (2011) Principal component analysis for interval-valued observations. Stat Anal Data Min 4:229–246

    MathSciNet  Google Scholar 

  • D’Urso P, Giordani P (2004) A least squares approach to principal component analysis for interval valued data. Chem Intell Lab Syst 70:179–192

    Google Scholar 

  • Ernst J, Nau GJ, Bar-Joseph Z (2005) Clustering short time series gene expression data. Bioinformatics 21(Suppl 1):i159-68

    Google Scholar 

  • Fernández A, García S, Galar M, Prati RC, Krawczyk B, Herrera F (2018) Dimensionality reduction for imbalanced learning. In: Learning from imbalanced data sets. Springer, Cham

  • Ferre L (1998) Determining the dimension in sliced inverse regression and related methods. J Am Stat Assoc 93(441):132–140

    MathSciNet  Google Scholar 

  • Gioia F, Lauro NC (2006) Principal component analysis on interval data. Comput Stat 21:343–363

    MathSciNet  Google Scholar 

  • Giordani P, Kiers HAL (2006) A comparison of three methods for principal component analysis of fuzzy interval data. Comput Stati Data Anal 51:379–397

    MathSciNet  Google Scholar 

  • Gracia A, Gonzalez S, Robles V, Menasalvas E (2014) A methodology to compare dimensionality reduction algorithms in terms of loss of quality. Inf Sci 270:1–27

    MathSciNet  Google Scholar 

  • Ichino M (2011) The quantile method for symbolic principal component analysis. Stat Anal Data Min 4(2):184–198

    MathSciNet  Google Scholar 

  • Irpino A (2006) Spaghetti PCA analysis: an extension of principal components analysis to time dependent interval data. Pattern Recogn Lett 27:504–513

    Google Scholar 

  • Irpino A (2013) Basic univariate and bivariate statistics for symbolic data: a critical review. Technical report

  • Klemelä J (2009) Smoothing of multivariate data: density estimation and visualization. Publisher: Wiley; 1 edition

  • Lauro CN, Gioia F (2006) Dependence and interdependence analysis for interval-valued variables. In: Bock H-H, Ferligoj A, Ziberna A (eds) Data Sci Classif, vol Batagelj. Springer-Verlag, Berlin, pp 171–183

    Google Scholar 

  • Lauro CN, Palumbo F (2000) Principal component analysis of interval data: a symbolic analysis approach. Comput Stat 15(1):73–87

    Google Scholar 

  • Lauro CN, Verde R, Irpino A (2008) Principal component analysis of symbolic data described by intervals, pp 279 – 311. In: Symbolic data analysis and the SODAS software edited by Edwin Diday. 2008

  • Lauro CN, Verde R (2000) Factorial data analysis on symbolic objects under cohesion constrains. In: Kiers HAL, Rasson JP, Groenen PJP, Schader M (eds) Data analysis classification and related methods. Springer-Verlag, Heidelberg

    Google Scholar 

  • Le-Rademacher J, Billard L (2012) Symbolic-covariance principal component analysis and visualization for interval-valued data. J Comput Graph Stat 21(2):413–432

    MathSciNet  Google Scholar 

  • Lee JA, Verleysen M (2009) Quality assessment of dimensionality reduction: rank-based criteria. Neurocomputing 72:1431–1443

    Google Scholar 

  • Li KC (1991) Sliced inverse regression for dimension reduction. J Am Stat Assoc 86:316–342

    MathSciNet  Google Scholar 

  • Li W, Guo J, Chen Y, Wang M (2016) A new representation of interval symbolic data and its application in dynamic clustering. J Classif 33(1):149–165

    MathSciNet  Google Scholar 

  • Liquet B, Saracco J (2012) A graphical tool for selecting the number of slices and the dimension of the model in SIR and SAVE approaches. Comput Stat 27:103–125

    MathSciNet  Google Scholar 

  • Lu HS, Wu HM (2010) Visualization, screening, and classification of cell cycle-regulated genes in yeast. Int J Syst Synth Biol 1(2):185–198

    Google Scholar 

  • Maia ALS, de Carvalho FAT, Ludermir TB (2008) Forecasting models for interval-valued time series. Neurocomputing 71(16–18):3344–3352

    Google Scholar 

  • Nueda MJ, Conesa A, Westerhuis JA, Hoefsloot HCJ, Smilde AK, Talon M, Ferrer A (2007) Discovering gene expression patterns in time course microarray experiments by ANOVA-SCA. Bioinformatics 23(14):1792–1800

    Google Scholar 

  • Palumbo F, Lauro CN (2003) A PCA for interval valued data based on midpoints and radii. In: Yanai H, Okada A, Shigematu K, Kano Y, Meulman JJ (eds) New developments in psychometrics. Springer-Verlag, Japan, pp 641–648

    Google Scholar 

  • Park J, Sriram TN, Yin X (2009) Central mean subspace in time series. J Comput Graph Stat 18:717–730

    MathSciNet  Google Scholar 

  • Park J, Sriram TN, Yin X (2010) Dimension reduction in time series. Stat Sinica 20:747–770

    MathSciNet  Google Scholar 

  • Raychaudhuri S, Stuart JM, Altman RB (2000) Principal components analysis to summarize microarray experiments: application to sporulation time series. Pacific symposium on biocomputing pp 455- 466

  • Sardá-Espinosa A (2019) Time-series clustering in r using the Dtwclust package. R J 11(1):22–43

    Google Scholar 

  • Setodji CM, Cook RD (2004) K-means inverse regression. Technometrics 46(4):421–429

    MathSciNet  Google Scholar 

  • Spellman PT et al (1998) Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell 9(12):3273–3297

    Google Scholar 

  • Sun Y, Wong AKC, Kamel MS (2009) Classification of imbalanced data: a review. Int J Pattern Recogn Artif Intell 23(04):687–719

    Google Scholar 

  • Teles P, Brito P (2015) Modeling interval time series with space-time. Commun Stat Theory Methods 44(17):3599–3627

    MathSciNet  Google Scholar 

  • Tsay RS (2010) Analysis of financial time series, 3rd edn. Wiley

    Google Scholar 

  • Wall ME, Rechtsteiner A, Rocha LM (2003) Singular value decomposition and principal component analysis. In: A practical approach to microarray data analysis (D.P. Berrar, W. Dubitzky and M. Granzow, eds.) Kluwer: Norwell, MA, pp 91 – 109

  • Wang H, Guan R, Wu J (2012) CIPCA: complete-information-based principal component analysis for interval-valued data. Neurocomputing 86:158–169

    Google Scholar 

  • Wei WWS (2019) Multivariate time series analysis with applications. Wiley

    Google Scholar 

  • Wills G (2012) Visualizing time: designing graphical representations for statistical data (statistics and computing). Springer, Verlag New York Inc

    Google Scholar 

  • Wu HM (2008) Kernel Sliced inverse regression with applications on classification. J Comput Graph Stat 17(3):590–610

    MathSciNet  Google Scholar 

  • Wu HM, Lu HHS (2004) Supervised motion segmentation by spatial-frequential analysis and dynamic sliced inverse regression. Stat Sinica 14:413–430

    MathSciNet  Google Scholar 

  • Wu HM, Kao CH, Chen CH (2020) Dimension reduction and visualization of symbolic interval-valued data using sliced inverse regression. In: Advances in data science: symbolic, complex, and network data (eds. Diday, E., Guan, R., Saporta, G., and Wang, H.). Wiley, pp 49 – 78

  • Yao WT, Wu HM (2013) Isometric sliced inverse regression or nonlinear manifolds learning. Stat Comput 23:563–576

    MathSciNet  Google Scholar 

  • Zhao J, Chevalier F, Pietriga E, Balakrishnan R (2011) Exploratory analysis of time-series with chronolenses. IEEE Transact Vis Comput Graph 17(12):2422–2431

    Google Scholar 

Download references

Acknowledgements

We are grateful for the valuable comments provided by the Editor, Associate Editor, and anonymous referees. Their input has helped us improve the paper immensely.

Funding

This research was supported by grants from the Ministry of Science and Technology of Taiwan, R. O. C. under the grants MOST103-2118-M-032-006 and MOST111-2628-E-038-001-MY2.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Han-Ming Wu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (PDF 184 kb)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Su, E.CY., Wu, HM. Dimension reduction and visualization of multiple time series data: a symbolic data analysis approach. Comput Stat 39, 1937–1969 (2024). https://doi.org/10.1007/s00180-023-01440-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-023-01440-7

Keywords

Navigation