Advertisement

DCS: A Policy Framework for the Detection of Correlated Data Streams

  • Rakan AlseghayerEmail author
  • Daniel Petrov
  • Panos K. ChrysanthisEmail author
  • Mohamed Sharaf
  • Alexandros Labrinidis
Conference paper
Part of the Lecture Notes in Business Information Processing book series (LNBIP, volume 337)

Abstract

There is an increasing demand for real-time analysis of large volumes of data streams that are produced at high velocity. The most recent data needs to be processed within a specified delay target in order for the analysis to lead to actionable result. To this end, in this paper, we present an effective solution for detecting the correlation of such data streams within a micro-batch of a fixed time interval. Our solution, coined DCS, for Detection of Correlated Data Streams, combines (1) incremental sliding-window computation of aggregates, to avoid unnecessary re-computations, (2) intelligent scheduling of computation steps and operations, driven by a utility function within a micro-batch, and (3) an exploration policy that tunes the utility function. Specifically, we propose nine policies that explore correlated pairs of live data streams across consecutive micro-batches. Our experimental evaluation on a real world dataset shows that some policies are more suitable to identifying high numbers of correlated pairs of live data streams, already known from previous micro-batches, while others are more suitable to identifying previously unseen pairs of live data streams across consecutive micro-batches.

Notes

Acknowledgment

This paper was partially supported by NSF under award CBET-1609120, and NIH under Award U01HL137159. The content is solely the responsibility of the authors and does not represent the views of NSF and NIH.

References

  1. 1.
    Kalinin, A., Cetintemel, U., Zdonik, S.: Searchlight: enabling integrated search and exploration over large multidimensional data. PVLDB 8(10), 1094–1105 (2015)Google Scholar
  2. 2.
    Orang, M., Shiri, N.: Improving performance of similarity measures for uncertain time series using preprocessing techniques. In: ACM SSDBM, pp. 31:1–31:12 (2015)Google Scholar
  3. 3.
    Zacharatou, E.T., Tauheedz, F., Heinis, T., Ailamaki, A.: RUBIK: efficient threshold queries on massive time series. In: ACM SSDBM, pp. 18:1–18:12 (2015)Google Scholar
  4. 4.
    Lee, D., Sim, A., Choi, J., Wu, K.: Novel data reduction based on statistical similarity. In: ACM SSDBM, pp. 21:1–21:12 (2016)Google Scholar
  5. 5.
    Shafer, I., Ren, K., Boddeti, V.N., Abe, Y., Ganger, G.R., Faloutsos, C.: RainMon: an integrated approach to mining bursty timeseries monitoring data. In: ACM SIGKDD, pp. 1158–1166 (2012)Google Scholar
  6. 6.
    Zhu, Y., Shasha, D.: StatStream: statistical monitoring of thousands of data streams in real time. In: VLDB, pp. 358–369 (2002)CrossRefGoogle Scholar
  7. 7.
    Cole, R., Shasha, D., Zhao, X.: Fast window correlations over uncooperative time series. In: ACM SIGKDD, pp. 743–749 (2005)Google Scholar
  8. 8.
    Jankov, D., Sikdar, S., Mukherjee, R., Teymourian, K., Jermaine, C.: Real-time high performance anomaly detection over data streams: grand challenge. In: ACM DEBS, pp. 292–297 (2017)Google Scholar
  9. 9.
    Zoumpatianos, K., Idreos, S., Palpanas, T.: Indexing for interactive exploration of big data series. In: ACM SIGMOD, pp. 1555–1566 (2014)Google Scholar
  10. 10.
    Idreos, S., Papaemmanouil, O., Chaudhuri, S.: Overview of data exploration techniques. In: ACM SIGMOD, pp. 277–281 (2015)Google Scholar
  11. 11.
    Feng, K., Cong, G., Bhowmick, S.S., Peng, W.C., Miao, C.: Towards best region search for data exploration. In: ACM SIGMOD, pp. 1055–1070 (2016)Google Scholar
  12. 12.
    Petrov, D., Alseghayer, R., Sharaf, M., Chrysanthis, P.K., Labrinidis, A.: Interactive exploration of correlated time series. In: ACM ExploreDB, pp. 2:1–2:6 (2017)Google Scholar
  13. 13.
    Alseghayer, R., Petrov, D., Chrysanthis, P.K., Sharaf, M., Labrinidis, A.: Detection of highly correlated live data streams. In: BIRTE, pp. 3:1–3:8 (2017)Google Scholar
  14. 14.
    Sakurai, Y., Papadimitriou, S., Faloutsos, C.: BRAID: stream mining through group lag correlations. In: ACM SIGMOD, pp. 599–610 (2005)Google Scholar
  15. 15.
    Yahoo Inc.: Yahoo finance historical data (2016)Google Scholar
  16. 16.
    Kalinin, A., Cetintemel, U., Zdonik, S.: Interactive data exploration using semantic windows. In: ACM SIGMOD, pp. 505–516 (2014)Google Scholar
  17. 17.
    Mueen, A., Nath, S., Liu, J.: Fast approximate correlation for massive time-series data. In: ACM SIGMOD, pp. 171–182 (2010)Google Scholar
  18. 18.
    Mirylenka, K., Dallachiesa, M., Palpanas, T.: Data series similarity using correlation-aware measures. In: ACM SSDBM, pp. 11:1–11:12 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of PittsburghPittsburghUSA
  2. 2.Department of Computer Science and Software Engineering, College of Information TechnologyUnited Arab Emirates UniversityAl AinUAE

Personalised recommendations