Data Mining and Knowledge Discovery

, Volume 28, Issue 5–6, pp 1530–1553 | Cite as

Confidence bands for time series data

Article

Abstract

Simultaneous confidence intervals, or confidence bands, provide an intuitive description of the variability of a time series. Given a set of \(N\) time series of length \(M\), we consider the problem of finding a confidence band that contains a \((1-\alpha )\)-fraction of the observations. We construct such confidence bands by finding the set of \(N\!\!-\!\!K\) time series whose envelope is minimized. We refer to this problem as the minimum width envelope problem. We show that the minimum width envelope problem is \(\mathbf {NP}\)-hard, and we develop a greedy heuristic algorithm, which we compare to quantile- and distance-based confidence band methods. We also describe a method to find an effective confidence level \(\alpha _{\mathrm {eff}}\) and an effective number of observations to remove \(K_{\mathrm {eff}}\), such that the resulting confidence bands will keep the family-wise error rate below \(\alpha \). We evaluate our methods on synthetic and real datasets. We demonstrate that our method can be used to construct confidence bands with guaranteed family-wise error rate control, also when there is too little data for the quantile-based methods to work.

Keywords

Simultaneous confidence interval Confidence band Time series Multiplicity correction Family-wise error rate 

Notes

Acknowledgments

The authors would like to thank Andreas Henelius for helpful discussions and suggestions. The work of J. Korpela and K. Puolamäki was supported in part by the Revolution of Knowledge Work Project, funded by Tekes (The Finnish Funding Agency for Innovation).

Supplementary material

10618_2014_371_MOESM1_ESM.pdf (36 kb)
Supplementary material 1 (pdf 36 KB)
10618_2014_371_MOESM2_ESM.pdf (25 kb)
Supplementary material 2 (pdf 25 KB)
10618_2014_371_MOESM3_ESM.pdf (60 kb)
Supplementary material 3 (pdf 59 KB)
10618_2014_371_MOESM4_ESM.pdf (3 mb)
Supplementary material 4 (pdf 3090 KB)

References

  1. Aggarwal CC (2013) Outlier analysis. Springer, New YorkCrossRefMATHGoogle Scholar
  2. Aigner W, Miksch S, Schumann H, Tominski C (2011) Visualization of time-oriented data. Human–computer interaction series. Springer, New YorkCrossRefGoogle Scholar
  3. Arlot S, Blanchard G, Roquain E (2010) Some nonasymptotic results on resampling in high dimension, I: confidence regions. Ann Stat 38(1):51–82. doi:10.1214/08-AOS667 CrossRefMATHMathSciNetGoogle Scholar
  4. Arning A, Agrawal R, Raghavan P (1996) A linear method for deviation detection in large databases. In: KDD, pp 164–169Google Scholar
  5. Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
  6. Chawla S, Gionis A (2013) k -means: a unified approach to clustering and outlier detection. In: Proceedings of SIAM international conference data mining (SDM)Google Scholar
  7. Davison A, Hinkley D (1997) Bootstrap methods and their application. Cambridge University Press, CambridgeCrossRefMATHGoogle Scholar
  8. Dudoit S, Shaffer JP, Boldrick JC (2003) Multiple hypothesis testing in microarray experiments. Stat Sci 18(1):71–103CrossRefMATHMathSciNetGoogle Scholar
  9. Efron B (2006) Minimum volume confidence regions for a multivariate normal mean vector. J R Stat Soc Ser B Stat Methodol 68(4):655–670. doi:10.1111/j.1467-9868.2006.00560.x CrossRefMATHMathSciNetGoogle Scholar
  10. Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng CK, Stanley HE (2000) PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23):E215–20CrossRefGoogle Scholar
  11. Guilbaud O (2008) Simultaneous confidence regions corresponding to Holm’s step-down procedure and other closed-testing procedures. Biom J 50(5):678–92. doi:10.1002/bimj.200710449 CrossRefMathSciNetGoogle Scholar
  12. Gupta M, Gao J, Aggarwal CC (2013) Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng 25(1):1–20CrossRefGoogle Scholar
  13. Hahn GJ, Meeker WQ (1991) Statistical intervals: a guide for practitioners. Wiley, New YorkCrossRefMATHGoogle Scholar
  14. Mandel M, Betensky R (2008) Simultaneous confidence intervals based on the percentile bootstrap approach. Comput Stat Data Anal 52(4):2158–2165. doi:10.1016/j.csda.2007.07.005 CrossRefMATHMathSciNetGoogle Scholar
  15. Moody GB, Mark RG (2001) The impact of the MIT-BIH arrhythmia database. IEEE Eng Med Biol Mag 20(3):45–50CrossRefGoogle Scholar
  16. Owen A (1990) Empirical likelihood ratio confidence regions. Ann Stat 18(1):90–120. doi:10.1214/aos/1176347494 CrossRefMATHMathSciNetGoogle Scholar
  17. Williams VV (2011) Breaking the coppersmith-winograd barrier, manuscriptGoogle Scholar
  18. Xavier EC (2012) A note on a maximum k-subset intersection problem. Inf Process Lett 112(12):471–472. doi:10.1016/j.ipl.2012.03.007 CrossRefMATHMathSciNetGoogle Scholar
  19. Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678Google Scholar

Copyright information

© The Author(s) 2014

Authors and Affiliations

  • Jussi Korpela
    • 1
  • Kai Puolamäki
    • 1
  • Aristides Gionis
    • 2
  1. 1.Finnish Institute of Occupational Health HelsinkiFinland
  2. 2.Department of Information and Computer Science, Helsinki Institute for Information Technology (HIIT)Aalto University Aalto, HelsinkiFinland

Personalised recommendations