Skip to main content
Log in

Confidence bands for time series data

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Simultaneous confidence intervals, or confidence bands, provide an intuitive description of the variability of a time series. Given a set of \(N\) time series of length \(M\), we consider the problem of finding a confidence band that contains a \((1-\alpha )\)-fraction of the observations. We construct such confidence bands by finding the set of \(N\!\!-\!\!K\) time series whose envelope is minimized. We refer to this problem as the minimum width envelope problem. We show that the minimum width envelope problem is \(\mathbf {NP}\)-hard, and we develop a greedy heuristic algorithm, which we compare to quantile- and distance-based confidence band methods. We also describe a method to find an effective confidence level \(\alpha _{\mathrm {eff}}\) and an effective number of observations to remove \(K_{\mathrm {eff}}\), such that the resulting confidence bands will keep the family-wise error rate below \(\alpha \). We evaluate our methods on synthetic and real datasets. We demonstrate that our method can be used to construct confidence bands with guaranteed family-wise error rate control, also when there is too little data for the quantile-based methods to work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://bitbucket.org/jtkorpel/mwe_2014.

  2. http://physionet.org/.

  3. http://physionet.org/physiobank/database/mitdb/.

  4. ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily.

  5. http://www.ncdc.noaa.gov/.

  6. http://archive.ics.uci.edu/ml/datasets/Individual+household+electric+power+consumption.

  7. https://bitbucket.org/jtkorpel/mwe_2014.

  8. http://en.wikipedia.org/wiki/2003_European_heat_wave.

    http://en.wikipedia.org/wiki/2006_European_heat_wave.

    http://en.wikipedia.org/wiki/2007_European_heat_wave.

References

  • Aggarwal CC (2013) Outlier analysis. Springer, New York

    Book  MATH  Google Scholar 

  • Aigner W, Miksch S, Schumann H, Tominski C (2011) Visualization of time-oriented data. Human–computer interaction series. Springer, New York

    Book  Google Scholar 

  • Arlot S, Blanchard G, Roquain E (2010) Some nonasymptotic results on resampling in high dimension, I: confidence regions. Ann Stat 38(1):51–82. doi:10.1214/08-AOS667

    Article  MATH  MathSciNet  Google Scholar 

  • Arning A, Agrawal R, Raghavan P (1996) A linear method for deviation detection in large databases. In: KDD, pp 164–169

  • Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml

  • Chawla S, Gionis A (2013) k -means: a unified approach to clustering and outlier detection. In: Proceedings of SIAM international conference data mining (SDM)

  • Davison A, Hinkley D (1997) Bootstrap methods and their application. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Dudoit S, Shaffer JP, Boldrick JC (2003) Multiple hypothesis testing in microarray experiments. Stat Sci 18(1):71–103

    Article  MATH  MathSciNet  Google Scholar 

  • Efron B (2006) Minimum volume confidence regions for a multivariate normal mean vector. J R Stat Soc Ser B Stat Methodol 68(4):655–670. doi:10.1111/j.1467-9868.2006.00560.x

    Article  MATH  MathSciNet  Google Scholar 

  • Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng CK, Stanley HE (2000) PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23):E215–20

    Article  Google Scholar 

  • Guilbaud O (2008) Simultaneous confidence regions corresponding to Holm’s step-down procedure and other closed-testing procedures. Biom J 50(5):678–92. doi:10.1002/bimj.200710449

    Article  MathSciNet  Google Scholar 

  • Gupta M, Gao J, Aggarwal CC (2013) Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng 25(1):1–20

    Article  Google Scholar 

  • Hahn GJ, Meeker WQ (1991) Statistical intervals: a guide for practitioners. Wiley, New York

    Book  MATH  Google Scholar 

  • Mandel M, Betensky R (2008) Simultaneous confidence intervals based on the percentile bootstrap approach. Comput Stat Data Anal 52(4):2158–2165. doi:10.1016/j.csda.2007.07.005

    Article  MATH  MathSciNet  Google Scholar 

  • Moody GB, Mark RG (2001) The impact of the MIT-BIH arrhythmia database. IEEE Eng Med Biol Mag 20(3):45–50

    Article  Google Scholar 

  • Owen A (1990) Empirical likelihood ratio confidence regions. Ann Stat 18(1):90–120. doi:10.1214/aos/1176347494

    Article  MATH  MathSciNet  Google Scholar 

  • Williams VV (2011) Breaking the coppersmith-winograd barrier, manuscript

  • Xavier EC (2012) A note on a maximum k-subset intersection problem. Inf Process Lett 112(12):471–472. doi:10.1016/j.ipl.2012.03.007

    Article  MATH  MathSciNet  Google Scholar 

  • Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678

    Google Scholar 

Download references

Acknowledgments

The authors would like to thank Andreas Henelius for helpful discussions and suggestions. The work of J. Korpela and K. Puolamäki was supported in part by the Revolution of Knowledge Work Project, funded by Tekes (The Finnish Funding Agency for Innovation).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jussi Korpela.

Additional information

Responsible editors: Toon Calders, Floriana Esposito, Eyke Hüllermeier, Rosa Meo.

Electronic supplementary material

Appendix

Appendix

1.1 Efficient implementation of order data structure \(\mathrm{R}\)

This section describes the data structure \(\mathrm{R}\), referred to in Algorithm 1, that allows the mwe algorithm to be efficient. \(\mathrm{R}\) stores the ordering information for columns \(j\) of the data matrix \(\mathbf {X}[i,j]\). A substructure \(\mathrm{R_j}\) for a single column \(j\) with \(N=5\) observations is shown in Fig. 11a. The rank order of the values in column \(j\) are stored in a doubly linked list, with the first element corresponding to the index \(i\) of the smallest element in \(\mathbf {X}[\cdot ,j]\). The second element contains the index of the second largest value etc. The indices of the (second) largest and (second) smallest values can be extracted in \(O(1)\) time for a single column \(j\), or in time \(O(M)\) for all columns (all values of \(j\)).

Fig. 11
figure 11

a An example data structure \(\mathrm{R_j}\) that combines a doubly linked list and an index vector to make the retrieval of largest/2nd largest ranks and associated observation indices a constant time operation. This structure allows the efficient implementation of rows 9–10 and 13 in Algorithm 1. b Same data structure with observation \(i=4\) removed showing the update of links within the list

The substructure \(\mathrm{R_j}\) additionally contains a vector of length \(N\), where the \(i\)th item is a pointer to the node of the doubly linked list with a value of \(i\). With the help of this additional vector, it is possible to delete (bypass) a node corresponding to any time series \(i\) from the doubly linked list as shown in Fig. 11b. This takes \(O(1)\) time for single column \(j\) and \(O(M)\) time for the whole time series. The data structure can be initialized in \(O(MN\log {N})\) time with the memory requirement of \(O(MN)\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Korpela, J., Puolamäki, K. & Gionis, A. Confidence bands for time series data. Data Min Knowl Disc 28, 1530–1553 (2014). https://doi.org/10.1007/s10618-014-0371-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-014-0371-0

Keywords

Navigation