Skip to main content

Abstract

Financial time series clustering finds application in forecasting, noise reduction and enhanced index tracking. The central theme in all the available clustering algorithms is the dissimilarity measure employed by the algorithm. The dissimilarity measures, applicable in financial domain, as used or suggested in past researches, are correlation based dissimilarity measure, temporal correlation based dissimilarity measure and dynamic time wrapping (DTW) based dissimilarity measure. One shortcoming of these dissimilarity measures is that they do not take into account the lead or lag existing between the returns of different stocks which changes with time. Mostly, such stocks with high value of correlation at some lead or lag belong to the same cluster (or sector). The present paper, proposes two new dissimilarity measures which show superior clustering results as compared to past measures when compared over 3 data sets comprising of 526 companies. abstract environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dose, C.: Clustering of financial time series with application to index and enhanced index tracking portfolio. Phys. A 355, 145–151 (2005)

    Article  MathSciNet  Google Scholar 

  2. Mantegna, S.: An Introduction to Econophysics Correlations and Complexity in Finance. Cambridge University Press, Cambridge (1999)

    Book  Google Scholar 

  3. Basalto, N., et al.: Hausdorff clustering of financial time series. Phys. A 379, 635–644 (2007)

    Google Scholar 

  4. Saeed, T.: Effective clustering of time-series data using FCM. Int. J. Mach. Learn. Comput. 4(2), 170–176 (2014)

    Article  Google Scholar 

  5. Guan, J.: Cluster financial time series for portfolio. In: Proceedings of the 2007 ICWAPR, Beijing, China, 2–4 November 2007

    Google Scholar 

  6. Marti, A., Nielson, D.: Clustering financial time series: how long is enough? In: Proceedings of the Twenty-Fifth IJCAI (2016)

    Google Scholar 

  7. Aghabozorgi, S., Shirkhorshidi, A.S., Wah, T.Y.: Time-series clustering - a decade review. Inform. Syst. 53, 16–38 (2015)

    Article  Google Scholar 

  8. John, G.: k-shape: efficient and accurate clustering of time series. SIGMOD Rec. 45(1), 69–76 (2016)

    Article  Google Scholar 

  9. Montero, V.: TSclust: an r package for time series clustering. J. Stat. Softw. 62(1), 1–43 (2014)

    Article  Google Scholar 

  10. Murtagh, L.: Ward’s hierarchical agglomerative clustering method: which algorithms implement Ward’s criterion? J. Classif. 31, 274–295 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  11. Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. In: KDD, Workshop, pp. 359–370 (1994)

    Google Scholar 

  12. Chouakria, A.D., Nagabhushan, P.N.: Adaptive dissimilarity index for measuring time series proximity. Adv. Data Anal. Classif. 1(1), 5–21 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  13. Chouakria-Douzal, A.: Compression technique preserving correlations of a multivariate temporal sequence. In: Advances in Intelligent Data Analysis, pp 566–577. Springer, Heidelberg (2003)

    Google Scholar 

  14. Gavrilov, M., et al.: Mining the stock market: which measure is best ? In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD, pp. 487–496 (2000)

    Google Scholar 

  15. Giorgino, T.: Computing and visualizing dynamic time warping alignments in R: the dtw package. J. Stat. Softw. 31(7), 1–24 (2009)

    Article  Google Scholar 

  16. Hoffmann, M., et al.: Estimation of the lead-lag parameter from non-synchronous data. ISI/BS Bernoulli 19(2), 426–461 (2013)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

Kartikay Gupta was supported by Teaching Assistantship Grant by Ministry of Human Resource Development, India.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kartikay Gupta .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Correlation Based Dissimilarity Measure (COR)

A simple dissimilarity measure for time series clustering is based on Pearson’s correlation factor between time series \( X_{T} \) and \( Y_{T} \) given by

$$\begin{aligned} COR(X_{T},Y_{T})=\frac{\sum _{t=1}^{T}(X_{t} - m_{X})(Y_{t} - m_{Y})}{\sqrt{\sum _{t=1}^{T}(X_{t} - m_{X})^{2}}\sqrt{\sum _{t=1}^{T}(Y_{t} - m_{Y})^{2}}} \end{aligned}$$

where \( m_{X} \) and \( m_{Y} \) are the average values of the time series \( X_{T} \) and \( Y_{T} \) respectively. The dissimilarity measure is then given by

$$\begin{aligned} Dissimilarity_{COR}(X_{T},Y_{T})=\sqrt{2(1-COR(X_{T},Y_{T}))} \end{aligned}$$

For more information regarding this distance measure, one may refer to [9].

1.2 A.2 Temporal Correlation Based Dissimilarity Measure (CORT)

As introduced by Douzal [12], the similarity between two time series is evaluated using first order temporal correlation coefficient [13] given by,

$$\begin{aligned} CORT(X_{T},Y_{T})=\frac{\sum _{t=1}^{T-1}(X_{t+1}-X_{t})(Y_{t+1}-Y_{t})}{\sqrt{\sum _{t=1}^{T-1}(X_{t+1}-X_{t})^{2}}\sqrt{\sum _{t=1}^{T-1}(Y_{t+1}-Y_{t})^{2}}} \end{aligned}$$

The dissimilarity proposed by Douzal [12] modulates the ‘dissimilarity value’ between \( X_T \) and \( Y_T \) using the coefficient \( CORT(X_{T},Y_{T}) \). Specifically, it is defined as follows.

$$\begin{aligned} Dissimilarity_{CORT}(X_{T},Y_{T})=\phi _{k}[CORT(X_{T},Y_{T})] \times Dissimilarity(X_{T},Y_{T}) \end{aligned}$$

where \( \phi _{k} \) is an adaptive function given by,

$$\begin{aligned} \phi _{k}(u) = \frac{2}{1 + e^{ku}},\, k\ge 0 \end{aligned}$$
(6)

and Dissimilarity\( (X_{T},Y_{T}) \) refers to dissimilarity value computed using any of the available dissimilarity measures like Euclidean, DTW etc. In this paper, we choose DTW as the preferred dissimilarity measure. This is because DTW effectively takes into account slight shape distortions while calculating its dissimilarity measure value. For more details regarding DTW, readers are referred to [9, 11].

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Gupta, K., Chatterjee, N. (2018). Financial Time Series Clustering. In: Satapathy, S., Joshi, A. (eds) Information and Communication Technology for Intelligent Systems (ICTIS 2017) - Volume 2. ICTIS 2017. Smart Innovation, Systems and Technologies, vol 84. Springer, Cham. https://doi.org/10.1007/978-3-319-63645-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63645-0_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63644-3

  • Online ISBN: 978-3-319-63645-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics