Skip to main content

Correlation Set Discovery on Time-Series Data

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11707))

Included in the following conference series:

Abstract

Time-series data analysis is essential in many modern applications, such as financial markets, sensor networks, and data centers, and correlation discovery is a core technique for the analysis. In this paper, we address a novel problem that computes a k-sized time-series dataset where the minimum Pearson correlation of any two time-series in the set is maximized. This problem discovers a group of time-series, which are highly correlated with each other, from a given time-series dataset without any prior knowledge, thus helps many analytical applications. We show that this problem is NP-hard, and design an approximate heuristic solution that provides a high quality result with fast response time. Extensive experiments on real and synthetic datasets verify the efficiency, effectiveness, and scalability of our solution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://en.wikipedia.org/wiki/Standard_score.

  2. 2.

    https://www.cs.ucr.edu/~eamonn/time_series_data_2018/.

References

  1. Amagata, D., Hara, T.: Mining top-k co-occurrence patterns across multiple streams. TKDE 29(10), 2249–2262 (2017)

    Google Scholar 

  2. Cole, R., Shasha, D., Zhao, X.: Fast window correlations over uncooperative time series. In: KDD, pp. 743–749 (2005)

    Google Scholar 

  3. Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: SoCG, pp. 253–262 (2004)

    Google Scholar 

  4. Drosou, M., Pitoura, E.: Diversity over continuous data. IEEE Data Eng. Bull. 32(4), 49–56 (2009)

    Google Scholar 

  5. Gan, J., Feng, J., Fang, Q., Ng, W.: Locality-sensitive hashing scheme based on dynamic collision counting. In: SIGMOD, pp. 541–552 (2012)

    Google Scholar 

  6. Guo, T., Sathe, S., Aberer, K.: Fast distributed correlation discovery over streaming time-series data. In: CIKM, pp. 1161–1170 (2015)

    Google Scholar 

  7. Huang, Q., Feng, J., Zhang, Y., Fang, Q., Ng, W.: Query-aware locality-sensitive hashing for approximate nearest neighbor search. PVLDB 9(1), 1–12 (2015)

    Google Scholar 

  8. Kato, S., Amagata, D., Nishio, S., Hara, T.: Monitoring range motif on streaming time-series. In: DEXA, pp. 251–266 (2018)

    Google Scholar 

  9. Kim, J., Ruggiero, M., Atienza, D., Lederberger, M.: Correlation-aware virtual machine allocation for energy-efficient datacenters. In: DATE, pp. 1345–1350 (2013)

    Google Scholar 

  10. Li, L., Hong, X., Tang, D., Na, M.: GHG emissions, economic growth and urbanization: a spatial approach. Sustainability 8(5), 462 (2016)

    Article  Google Scholar 

  11. Li, Y., Yiu, M.L., Gong, Z., et al.: Quick-motif: an efficient and scalable framework for exact motif discovery. In: ICDE, pp. 579–590 (2015)

    Google Scholar 

  12. Linardi, M., Zhu, Y., Palpanas, T., Keogh, E.: Matrix profile x: VALMOD-scalable discovery of variable-length motifs in data series. In: SIGMOD, pp. 1053–1066 (2018)

    Google Scholar 

  13. Lucas, D., et al.: Designing optimal greenhouse gas observing networks that consider performance and cost. Geosci. Instrum. Methods Data Syst. 4(1), 121 (2015)

    Article  Google Scholar 

  14. Marti, G., Andler, S., Nielsen, F., Donnat, P.: Clustering financial time series: how long is enough?. In: IJCAI, pp. 2583–2589 (2016)

    Google Scholar 

  15. Mueen, A., Keogh, E., Bigdely-Shamlo, N.: Finding time series motifs in disk-resident data. In: ICDM, pp. 367–376 (2009)

    Google Scholar 

  16. Mueen, A., Keogh, E., Zhu, Q., Cash, S., Westover, B.: Exact discovery of time series motifs. In: SDM, pp. 473–484 (2009)

    Google Scholar 

  17. Mueen, A., Nath, S., Liu, J.: Fast approximate correlation for massive time-series data. In: SIGMOD, pp. 171–182 (2010)

    Google Scholar 

  18. Ravi, S.S., Rosenkrantz, D.J., Tayi, G.K.: Facility dispersion problems: heuristics and special cases. In: Dehne, F., Sack, J.-R., Santoro, N. (eds.) WADS 1991. LNCS, vol. 519, pp. 355–366. Springer, Heidelberg (1991). https://doi.org/10.1007/BFb0028275

    Chapter  MATH  Google Scholar 

  19. Reiss, C., Wilkes, J., Hellerstein, J.L.: Google cluster-usage traces: format+ schema, pp. 1–14. Google Inc., White Paper (2011)

    Google Scholar 

  20. Tsytsarau, M., Amer-Yahia, S., Palpanas, T.: Efficient sentiment correlation for large-scale demographics. In: SIGMOD, pp. 253–264 (2013)

    Google Scholar 

  21. Vlachos, M., Hadjieleftheriou, M., Gunopulos, D., Keogh, E.: Indexing multidimensional time-series. VLDB J. 15(1), 1–20 (2006)

    Article  Google Scholar 

  22. Yankov, D., Keogh, E., Medina, J., Chiu, B., Zordan, V.: Detecting time series motifs under uniform scaling. In: KDD, pp. 844–853 (2007)

    Google Scholar 

  23. Yeh, C.C.M., Kavantzas, N., Keogh, E.: Matrix profile vi: meaningful multidimensional motif discovery. In: ICDM, pp. 565–574 (2017)

    Google Scholar 

  24. Yeh, C.C.M., et al.: Matrix profile i: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: ICDM, pp. 1317–1322 (2016)

    Google Scholar 

  25. Yi, X., Zheng, Y., Zhang, J., Li, T.: ST-MVL: filling missing values in geo-sensory time series data. In: IJCAI, pp. 2704–2710 (2016)

    Google Scholar 

  26. Zhu, Y., et al.: Matrix profile ii: exploiting a novel algorithm and GPUs to break the one hundred million barrier for time series motifs and joins. In: ICDM, pp. 739–748 (2016)

    Google Scholar 

  27. Zhu, Y., Shasha, D.: Statstream: statistical monitoring of thousands of data streams in real time. In: VLDB, pp. 358–369 (2002)

    Chapter  Google Scholar 

Download references

Acknowledgment

This research is partially supported by JSPS Grant-in-Aid for Scientific Research (A) Grant Number 18H04095, JSPS Grant-in-Aid for Young Scientists (B) Grant Number JP16K16056, and JST CREST Grant Number J181401085.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daichi Amagata .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Amagata, D., Hara, T. (2019). Correlation Set Discovery on Time-Series Data. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2019. Lecture Notes in Computer Science(), vol 11707. Springer, Cham. https://doi.org/10.1007/978-3-030-27618-8_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27618-8_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27617-1

  • Online ISBN: 978-3-030-27618-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics