Skip to main content

DCS: A Policy Framework for the Detection of Correlated Data Streams

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 337))

Abstract

There is an increasing demand for real-time analysis of large volumes of data streams that are produced at high velocity. The most recent data needs to be processed within a specified delay target in order for the analysis to lead to actionable result. To this end, in this paper, we present an effective solution for detecting the correlation of such data streams within a micro-batch of a fixed time interval. Our solution, coined DCS, for Detection of Correlated Data Streams, combines (1) incremental sliding-window computation of aggregates, to avoid unnecessary re-computations, (2) intelligent scheduling of computation steps and operations, driven by a utility function within a micro-batch, and (3) an exploration policy that tunes the utility function. Specifically, we propose nine policies that explore correlated pairs of live data streams across consecutive micro-batches. Our experimental evaluation on a real world dataset shows that some policies are more suitable to identifying high numbers of correlated pairs of live data streams, already known from previous micro-batches, while others are more suitable to identifying previously unseen pairs of live data streams across consecutive micro-batches.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    In DCS [12, 13], Blind was referred to as Cold Start whereas the other proposed policies here are instances of Warm Start.

References

  1. Kalinin, A., Cetintemel, U., Zdonik, S.: Searchlight: enabling integrated search and exploration over large multidimensional data. PVLDB 8(10), 1094–1105 (2015)

    Google Scholar 

  2. Orang, M., Shiri, N.: Improving performance of similarity measures for uncertain time series using preprocessing techniques. In: ACM SSDBM, pp. 31:1–31:12 (2015)

    Google Scholar 

  3. Zacharatou, E.T., Tauheedz, F., Heinis, T., Ailamaki, A.: RUBIK: efficient threshold queries on massive time series. In: ACM SSDBM, pp. 18:1–18:12 (2015)

    Google Scholar 

  4. Lee, D., Sim, A., Choi, J., Wu, K.: Novel data reduction based on statistical similarity. In: ACM SSDBM, pp. 21:1–21:12 (2016)

    Google Scholar 

  5. Shafer, I., Ren, K., Boddeti, V.N., Abe, Y., Ganger, G.R., Faloutsos, C.: RainMon: an integrated approach to mining bursty timeseries monitoring data. In: ACM SIGKDD, pp. 1158–1166 (2012)

    Google Scholar 

  6. Zhu, Y., Shasha, D.: StatStream: statistical monitoring of thousands of data streams in real time. In: VLDB, pp. 358–369 (2002)

    Chapter  Google Scholar 

  7. Cole, R., Shasha, D., Zhao, X.: Fast window correlations over uncooperative time series. In: ACM SIGKDD, pp. 743–749 (2005)

    Google Scholar 

  8. Jankov, D., Sikdar, S., Mukherjee, R., Teymourian, K., Jermaine, C.: Real-time high performance anomaly detection over data streams: grand challenge. In: ACM DEBS, pp. 292–297 (2017)

    Google Scholar 

  9. Zoumpatianos, K., Idreos, S., Palpanas, T.: Indexing for interactive exploration of big data series. In: ACM SIGMOD, pp. 1555–1566 (2014)

    Google Scholar 

  10. Idreos, S., Papaemmanouil, O., Chaudhuri, S.: Overview of data exploration techniques. In: ACM SIGMOD, pp. 277–281 (2015)

    Google Scholar 

  11. Feng, K., Cong, G., Bhowmick, S.S., Peng, W.C., Miao, C.: Towards best region search for data exploration. In: ACM SIGMOD, pp. 1055–1070 (2016)

    Google Scholar 

  12. Petrov, D., Alseghayer, R., Sharaf, M., Chrysanthis, P.K., Labrinidis, A.: Interactive exploration of correlated time series. In: ACM ExploreDB, pp. 2:1–2:6 (2017)

    Google Scholar 

  13. Alseghayer, R., Petrov, D., Chrysanthis, P.K., Sharaf, M., Labrinidis, A.: Detection of highly correlated live data streams. In: BIRTE, pp. 3:1–3:8 (2017)

    Google Scholar 

  14. Sakurai, Y., Papadimitriou, S., Faloutsos, C.: BRAID: stream mining through group lag correlations. In: ACM SIGMOD, pp. 599–610 (2005)

    Google Scholar 

  15. Yahoo Inc.: Yahoo finance historical data (2016)

    Google Scholar 

  16. Kalinin, A., Cetintemel, U., Zdonik, S.: Interactive data exploration using semantic windows. In: ACM SIGMOD, pp. 505–516 (2014)

    Google Scholar 

  17. Mueen, A., Nath, S., Liu, J.: Fast approximate correlation for massive time-series data. In: ACM SIGMOD, pp. 171–182 (2010)

    Google Scholar 

  18. Mirylenka, K., Dallachiesa, M., Palpanas, T.: Data series similarity using correlation-aware measures. In: ACM SSDBM, pp. 11:1–11:12 (2017)

    Google Scholar 

Download references

Acknowledgment

This paper was partially supported by NSF under award CBET-1609120, and NIH under Award U01HL137159. The content is solely the responsibility of the authors and does not represent the views of NSF and NIH.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Rakan Alseghayer or Panos K. Chrysanthis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Alseghayer, R., Petrov, D., Chrysanthis, P.K., Sharaf, M., Labrinidis, A. (2019). DCS: A Policy Framework for the Detection of Correlated Data Streams. In: Castellanos, M., Chrysanthis, P., Pelechrinis, K. (eds) Real-Time Business Intelligence and Analytics. BIRTE BIRTE BIRTE 2015 2016 2017. Lecture Notes in Business Information Processing, vol 337. Springer, Cham. https://doi.org/10.1007/978-3-030-24124-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-24124-7_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-24123-0

  • Online ISBN: 978-3-030-24124-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics