Advertisement

StatStream

  • Dennis Shasha
  • Yunyue Zhu
Part of the Monographs in Computer Science book series (MCS)

Summary

Consider the problem of monitoring tens of thousands of time series data streams in an online fashion and making decisions based on them. In addition to single stream statistics such as average and standard deviation, we also want to find the most highly correlated pairs of streams especially in a sliding window sense. A stock market trader might use such a tool to spot arbitrage opportunities. In this chapter, we propose efficient methods for solving this problem based on Discrete Fourier Transforms (see Chapter 2) and a three level time interval hierarchy. Extensive experiments on synthetic data and real world financial trading data show that our algorithm beats the direct computation approach by several orders of magnitude. It also improves on previous Fourier Transform approaches by allowing the efficient computation of time-delayed correlation over any size sliding window and any time delay. Correlation also lends itself to an efficient grid-based data structure. The result is the first algorithm that we know of to compute correlations over thousands of data streams in real time. The algorithm is incremental, has fixed response time, and can monitor the pairwise correlations of 10,000 streams on a single PC. The algorithm is embarrassingly parallelizable.

Keywords

Data Stream Discrete Fourier Transform Grid Structure Wall Clock Time Sliding Window 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Dennis E. Shasha and Yunyue Zhu 2004

Authors and Affiliations

  • Dennis Shasha
    • 1
  • Yunyue Zhu
    • 1
  1. 1.Courant InstituteNew YorkUSA

Personalised recommendations