# Confidence bands for time series data

- 676 Downloads
- 4 Citations

## Abstract

Simultaneous confidence intervals, or *confidence bands*, provide an intuitive description of the variability of a time series. Given a set of \(N\) time series of length \(M\), we consider the problem of finding a confidence band that contains a \((1-\alpha )\)-fraction of the observations. We construct such confidence bands by finding the set of \(N\!\!-\!\!K\) time series whose envelope is minimized. We refer to this problem as the *minimum width envelope* problem. We show that the minimum width envelope problem is \(\mathbf {NP}\)-hard, and we develop a greedy heuristic algorithm, which we compare to quantile- and distance-based confidence band methods. We also describe a method to find an effective confidence level \(\alpha _{\mathrm {eff}}\) and an effective number of observations to remove \(K_{\mathrm {eff}}\), such that the resulting confidence bands will keep the family-wise error rate below \(\alpha \). We evaluate our methods on synthetic and real datasets. We demonstrate that our method can be used to construct confidence bands with guaranteed family-wise error rate control, also when there is too little data for the quantile-based methods to work.

## Keywords

Simultaneous confidence interval Confidence band Time series Multiplicity correction Family-wise error rate## Notes

### Acknowledgments

The authors would like to thank Andreas Henelius for helpful discussions and suggestions. The work of J. Korpela and K. Puolamäki was supported in part by the Revolution of Knowledge Work Project, funded by Tekes (The Finnish Funding Agency for Innovation).

## Supplementary material

## References

- Aggarwal CC (2013) Outlier analysis. Springer, New YorkCrossRefzbMATHGoogle Scholar
- Aigner W, Miksch S, Schumann H, Tominski C (2011) Visualization of time-oriented data. Human–computer interaction series. Springer, New YorkCrossRefGoogle Scholar
- Arlot S, Blanchard G, Roquain E (2010) Some nonasymptotic results on resampling in high dimension, I: confidence regions. Ann Stat 38(1):51–82. doi: 10.1214/08-AOS667 CrossRefzbMATHMathSciNetGoogle Scholar
- Arning A, Agrawal R, Raghavan P (1996) A linear method for deviation detection in large databases. In: KDD, pp 164–169Google Scholar
- Bache K, Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
- Chawla S, Gionis A (2013) k -means: a unified approach to clustering and outlier detection. In: Proceedings of SIAM international conference data mining (SDM)Google Scholar
- Davison A, Hinkley D (1997) Bootstrap methods and their application. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
- Dudoit S, Shaffer JP, Boldrick JC (2003) Multiple hypothesis testing in microarray experiments. Stat Sci 18(1):71–103CrossRefzbMATHMathSciNetGoogle Scholar
- Efron B (2006) Minimum volume confidence regions for a multivariate normal mean vector. J R Stat Soc Ser B Stat Methodol 68(4):655–670. doi: 10.1111/j.1467-9868.2006.00560.x CrossRefzbMATHMathSciNetGoogle Scholar
- Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng CK, Stanley HE (2000) PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23):E215–20CrossRefGoogle Scholar
- Guilbaud O (2008) Simultaneous confidence regions corresponding to Holm’s step-down procedure and other closed-testing procedures. Biom J 50(5):678–92. doi: 10.1002/bimj.200710449 CrossRefMathSciNetGoogle Scholar
- Gupta M, Gao J, Aggarwal CC (2013) Outlier detection for temporal data: a survey. IEEE Trans Knowl Data Eng 25(1):1–20CrossRefGoogle Scholar
- Hahn GJ, Meeker WQ (1991) Statistical intervals: a guide for practitioners. Wiley, New YorkCrossRefzbMATHGoogle Scholar
- Mandel M, Betensky R (2008) Simultaneous confidence intervals based on the percentile bootstrap approach. Comput Stat Data Anal 52(4):2158–2165. doi: 10.1016/j.csda.2007.07.005 CrossRefzbMATHMathSciNetGoogle Scholar
- Moody GB, Mark RG (2001) The impact of the MIT-BIH arrhythmia database. IEEE Eng Med Biol Mag 20(3):45–50CrossRefGoogle Scholar
- Owen A (1990) Empirical likelihood ratio confidence regions. Ann Stat 18(1):90–120. doi: 10.1214/aos/1176347494 CrossRefzbMATHMathSciNetGoogle Scholar
- Williams VV (2011) Breaking the coppersmith-winograd barrier, manuscriptGoogle Scholar
- Xavier EC (2012) A note on a maximum k-subset intersection problem. Inf Process Lett 112(12):471–472. doi: 10.1016/j.ipl.2012.03.007 CrossRefzbMATHMathSciNetGoogle Scholar
- Xu R, Wunsch D (2005) Survey of clustering algorithms. IEEE Trans Neural Netw 16(3):645–678Google Scholar