Explaining Interval Sequences by Randomization

Henelius, Andreas; Korpela, Jussi; Puolamäki, Kai

doi:10.1007/978-3-642-40988-2_22

Andreas Henelius²³,
Jussi Korpela²³ &
Kai Puolamäki²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8188))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

3386 Accesses
2 Citations

Abstract

Sequences of events are an ubiquitous form of data. In this paper, we show that it is feasible to present an event sequence as an interval sequence. We show how sequences can be efficiently randomized, how to choose a correct null model and how to use randomizations to derive confidence intervals. Using these techniques, we gain knowledge of the temporal structure of the sequence. Time and Fourier space representations, autocorrelations and arbitrary features can be used as constraints in investigating the data. The methods presented are applied to two real-life datasets; a medical heart interbeat interval dataset and a word dataset from a book. We find that the interval sequence representation and randomization methods provide a powerful way to explore interval sequences and explain their structure.

Download to read the full chapter text

Chapter PDF

Size matters: choosing the most informative set of window lengths for mining patterns in event sequences

Article 09 December 2014

Permutation Entropy and Order Patterns in Long Time Series

ROC632: An Overview

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Bigger, J.T., Fleiss, J.L., Steinman, R.C., Rolnitzky, L.M., Schneider, W.J., Stein, P.K.: RR variability in healthy, middle-aged persons compared with patients with chronic coronary heart disease or recent acute myocardial infarction. Circulation 91(7), 1936–1943 (1995)
Article Google Scholar
Bullmore, E., Long, C., Suckling, J., Fadili, J., Calvert, G., Zelaya, F., Carpenter, T.A., Brammer, M.: Colored noise and computational inference in neurophysiological (fMRI) time series analysis: Resampling methods in time and wavelet domains. Human Brain Mapping 12(2), 61–78 (2001)
Article Google Scholar
Carlstein, E.G.: Resampling techniques for stationary time-series: some recent developments. University of North Carolina at Chapel Hill (1990)
Google Scholar
Clifford, G.D., Azuaje, F., McSharry, P., et al. (eds.): Advanced Methods and Tools for ECG Data Analysis. Artech House, London (2006)
Google Scholar
De Bie, T.: An information theoretic framework for data mining. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2011, pp. 564–572. ACM, New York (2011)
Google Scholar
De Bie, T.: Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Mining and Knowledge Discovery 23(3), 407–446 (2011)
Article MathSciNet MATH Google Scholar
Faes, L., Zhao, H., Chon, K., Nollo, G.: Time-varying surrogate data to assess nonlinearity in nonstationary time series: Application to heart rate variability. IEEE Transactions on Biomedical Engineering 56(3), 685–695 (2009)
Article Google Scholar
Garde, S., Regalado, M.G., Schechtman, V.L., Khoo, M.C.: Nonlinear dynamics of heart rate variability in cocaine-exposed neonates during sleep. American Journal of Physiology-Heart and Circulatory Physiology 280(6), H2920–H2928 (2001)
Google Scholar
Geyer, C.J.: Markov chain Monte Carlo Maximum Likelihood. In: Computing Science and Statistics: The 23rd Symposium on the Interface, pp. 156–163. Interface Foundation, Fairfax (1991)
Google Scholar
Gionis, A., Mannila, H., Mielikäinen, T., Tsaparas, P.: Assessing data mining results via swap randomization. ACM Trans. Knowl. Discov. Data 1(3) (December 2007)
Google Scholar
Goldberger, A.L., Amaral, L.A.N., Glass, L., Hausdorff, J.M., Ivanov, P.C., Mark, R.G., Mietus, J.E., Moody, G.B., Peng, C.K., Stanley, H.E.: PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation 101(23), e215–e220 (2000)
Google Scholar
Good, P.I.: Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses. Springer (2000)
Google Scholar
Hanhijärvi, S., Garriga, G.C., Puolamäki, K.: Randomization techniques for graphs. In: Proceedings of the 9th SIAM International Conference on Data Mining (SDM 2009), pp. 780–791 (2009)
Google Scholar
Hanhijärvi, S., Ojala, M., Vuokko, N., Puolamäki, K., Tatti, N., Mannila, H.: Tell me something I don’t know: randomization strategies for iterative data mining. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2009, pp. 379–388. ACM, New York (2009)
Chapter Google Scholar
Kallio, A., Vuokko, N., Ojala, M., Haiminen, N., Mannila, H.: Randomization techniques for assessing the significance of gene periodicity results. BMC Bioinformatics 12(1), 330 (2011)
Article Google Scholar
Kreiss, J.P., Franke, J.: Bootstrapping stationary autoregressive moving-average models. Journal of Time Series Analysis 13(4), 297–317 (1992)
Article MathSciNet MATH Google Scholar
Laird, A.R., Rogers, B.P., Meyerand, M.E.: Comparison of fourier and wavelet resampling methods. Magnetic Resonance in Medicine 51(2), 418–422 (2004)
Article Google Scholar
Li, C., Ding, G.H., Wu, G.Q., Poon, C.S.: Band-phase-randomized surrogate data reveal high-frequency chaos in heart rate variability. In: 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 2806–2809 (2010)
Google Scholar
Lijffijt, J., Papapetrou, P., Puolamäki, K.: Size matters: Finding the most informative set of window lengths. In: Flach, P.A., De Bie, T., Cristianini, N. (eds.) ECML PKDD 2012, Part II. LNCS, vol. 7524, pp. 451–466. Springer, Heidelberg (2012)
Chapter Google Scholar
Lijffijt, J., Papapetrou, P., Puolamäki, K.: A statistical significance testing approach to mining the most informative set of patterns. Data Mining and Knowledge Discovery (December 2012) (to appear) (published online before print)
Google Scholar
Lijffijt, J., Papapetrou, P., Puolamäki, K., Mannila, H.: Analyzing word frequencies in large text corpora using inter-arrival times and bootstrapping. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011, Part II. LNCS, vol. 6912, pp. 341–357. Springer, Heidelberg (2011)
Chapter Google Scholar
Liu, J.: Monte Carlo Strategies in Scientific Computing. Series in Statistics. Springer (2008)
Google Scholar
Mietus, J., Peng, C., Henry, I., Goldsmith, R., Goldberger, A.: The pnnx files: re-examining a widely used heart rate variability measure. Heart 88(4), 378–380 (2002)
Article Google Scholar
Ojala, M., Vuokko, N., Kallio, A., Haiminen, N., Mannila, H.: Randomization methods for assessing data analysis results on real-valued matrices. Statistical Analysis and Data Mining 2(4), 209–230 (2009)
Article MathSciNet Google Scholar
Politis, D.N.: The impact of bootstrap methods on time series analysis. Statistical Science 18(2), 219–230 (2003)
Article MathSciNet Google Scholar
Prichard, D., Theiler, J.: Generating surrogate data for time series with several simultaneously masured variables. Physical Review Letters 73(7), 951–954 (1994)
Article Google Scholar
R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2013) ISBN 3-900051-07-0, http://www.R-project.org/
Schreiber, T.: Constrained randomization of time series data. Physical Review Letters 80(10), 2105–2108 (1998)
Article Google Scholar
Schreiber, T., Schmitz, A.: Improved Surrogate Data for Nonlinearity Tests. Physical Review Letters 77(4), 635–638 (1996)
Article Google Scholar
Schreiber, T., Schmitz, A.: Surrogate time series. Physica D: Nonlinear Phenomena 142(3-4), 346–382 (2000)
Article MathSciNet MATH Google Scholar
Sörnmo, L., Laguna, P.: Bioelectrical Signal Processing in Cardiac and Neurological Applications. Academic Press (2005)
Google Scholar
Theiler, J., Eubank, S., Longtin, A., Galdrikian, B., Doyne Farmer, J.: Testing for nonlinearity in time series: the method of surrogate data. Physica D: Nonlinear Phenomena 58(1), 77–94 (1992)
Article MATH Google Scholar
Theiler, J., Prichard, D.: Constrained-realization Monte-Carlo method for hypothesis testing. Physica D: Nonlinear Phenomena 94(4), 221–235 (1996)
Article MATH Google Scholar
Vinod, H.D.: Maximum entropy ensembles for time series inference in economics. Journal of Asian Economics 17(6), 955–978 (2006)
Article Google Scholar
Vuokko, N., Kaski, P.: Significance of patterns in time series collections. In: Proceedings of the Eleventh SIAM International Conference on Data Mining, Mesa, AZ, April 28-30, pp. 676–686. SIAM, Philadelphia (2011)
Google Scholar
Westfall, P.H., Young, S.: Resampling-Based Multiple Testing: Examples and Methods for P-Value Adjustment. A Wiley-Interscience publication, Wiley (1993)
Google Scholar
Xu, X., Schuckers, S.: Automatic detection of artifacts in heart period data. Journal of Electrocardiology 34(4), 205–210 (2001)
Article Google Scholar
Ying, X., Wu, X.: Graph generation with prescribed feature constraints. In: Proceedings of the 9th SIAM International Conference on Data Mining (SDM 2009), pp. 966–977 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Finnish Institute of Occupational Health, Topeliuksenkatu 41 a A, FI-00250, Helsinki, Finland
Andreas Henelius, Jussi Korpela & Kai Puolamäki

Authors

Andreas Henelius
View author publications
You can also search for this author in PubMed Google Scholar
Jussi Korpela
View author publications
You can also search for this author in PubMed Google Scholar
Kai Puolamäki
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, 3001, Leuven, Belgium
Hendrik Blockeel
Fraunhofer IAIS, Department of Knowledge Discovery, University of Bonn, Schloss Birlinghoven, 53754, Sankt Augustin, Germany
Kristian Kersting
LIACS, Universiteit Leiden, Niels Bohrweg 1, 2333 CA, Leiden, The Netherlands
Siegfried Nijssen
Department of Computer Science and Engineering, Czech Technical University, Technicka 2, 16627, Prague 6, Czech Republic
Filip Železný

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Henelius, A., Korpela, J., Puolamäki, K. (2013). Explaining Interval Sequences by Randomization. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2013. Lecture Notes in Computer Science(), vol 8188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40988-2_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-40988-2_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40987-5
Online ISBN: 978-3-642-40988-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Explaining Interval Sequences by Randomization

Abstract

Chapter PDF

Similar content being viewed by others

Size matters: choosing the most informative set of window lengths for mining patterns in event sequences

Permutation Entropy and Order Patterns in Long Time Series

ROC632: An Overview

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Explaining Interval Sequences by Randomization

Abstract

Chapter PDF

Similar content being viewed by others

Size matters: choosing the most informative set of window lengths for mining patterns in event sequences

Permutation Entropy and Order Patterns in Long Time Series

ROC632: An Overview

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation