ADS: the adaptive data series index

Zoumpatianos, Kostas; Idreos, Stratos; Palpanas, Themis

doi:10.1007/s00778-016-0442-5

ADS: the adaptive data series index

Regular Paper
Published: 31 August 2016

Volume 25, pages 843–866, (2016)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

2182 Accesses
54 Citations
Explore all metrics

Abstract

Numerous applications continuously produce big amounts of data series, and in several time critical scenarios analysts need to be able to query these data as soon as they become available. This, however, is not currently possible with the state-of-the-art indexing methods and for very large data series collections. In this paper, we present the first adaptive indexing mechanism, specifically tailored to solve the problem of indexing and querying very large data series collections. We present a detailed design and evaluation of our method using approximate and exact query algorithms with both synthetic and real data sets. Adaptive indexing significantly outperforms previous solutions, gracefully handling large data series collections, reducing the data to query delay: By the time state-of-the-art indexing techniques finish indexing 1 billion data series (and before answering even a single query), our method has already answered \(3*10^5\) queries.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

This paper is an extended version of [22]. It describes an exact search algorithm and a new full index construction method, both outperforming the state of the art. It also includes more detailed discussions and additional experiments.

References

Huijse, P., Estévez, P.A., Protopapas, P., Principe, J.C., Zegers, P.: Computational intelligence challenges and applications on large-scale astronomical time series databases. IEEE Comput. Intell. Mag. 9(3), 27–39 (2014)
Article Google Scholar
Kashino, K., Smith, G., Murase, H.: Time-series active search for quick retrieval of audio and video. In: ICASSP (1999)
Raza, U., Camerra, A., Murphy, A.L., Palpanas, T., Picco, G.P.: Practical data prediction for real-world wireless sensor networks. IEEE Trans. Knowl. Data Eng. 27(8), 2231–2244 (2015)
Shasha, D.: Tuning time series queries in finance: case studies and recommendations. IEEE Data Eng. Bull. 22(2), 40–46 (1999)
Google Scholar
Ye, L., Keogh, E.J.: Time series shapelets: a new primitive for data mining. In: KDD (2009)
Bu, Y., Wing L.T., Chee F.A.W., Keogh, E., Pei, J., Meshkin, S.: Wat: finding top-k discords in time series database. In: SDM (2007)
Dallachiesa, M., Nushi, B., Mirylenka, K., Palpanas, T.: Uncertain time-series similarity: return to the basics. PVLDB 5(11), 1662–1673 (2012)
Google Scholar
Dallachiesa, M., Palpanas, T., Ilyas, I.F.: Top-k nearest neighbor search in uncertain data series. PVLDB 8(1), 13–24 (2014)
Google Scholar
Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., Keogh, E.: Searching and mining trillions of time series subsequences under dynamic time warping. In: KDD (2012)
Rodrigues, P., Gama, J., Pedroso, J.: Hierarchical clustering of time-series data streams. IEEE Trans. Knowl. Data Eng. 20(5), 615–627 (2008)
Article Google Scholar
Wang, Y., Wang, P., Pei, J., Wang, W., Huang, S.: A data-adaptive and dynamic segmentation index for whole matching on time series. PVLDB 6(10), 793–804 (2013)
Google Scholar
Camerra, A., Palpanas, T., Shieh, J., Keogh, E.: iSAX 2.0: indexing and mining one billion time series. In: ICDM (2010)
QualiMaster a configurable real-time data processing infrastructure mastering autonomous quality adaptation—deliverable D1.1: initial use cases and requirements. Technical report, QualiMaster Project (2014)
Rogers, S.: Big data is scaling bi and analytics Information Management. http://www.information-management.com/issues/21_5/big-data-is-scaling-bi-and-analytics-10021093-1.html (2011). Accessed 28 Aug 2016
Adhd-200. http://fcon_1000.projects.nitrc.org/indi/adhd200/ (2011)
Sloan digital sky survey. https://www.sdss3.org/dr10/data_access/volume.php (2015)
Idreos, S., Alagiannis, I., Johnson, R., Ailamaki, A.: Here are my data files. Here are my queries. Where are my results? In: CIDR (2011)
Idreos, S., Liarou, E.: dbtouch: analytics at your fingertips. In: CIDR (2013)
Guttman, A.: R-trees a dynamic structure for spatial searching. In: SIGMOD (1984)
Berchtold, S., Keim, D.A., Kriegel, H.P.: The X-tree: an index structure for high-dimensional data. In: VLDB (1996)
Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)
Article MathSciNet MATH Google Scholar
Zoumpatianos, K., Idreos, S., Palpanas, T.: Indexing for interactive exploration of big data series. In: SIGMOD (2014)
Agrawal, R., Faloutsos, C., Swami, A.N.: Efficient similarity search in sequence databases. In: FODO Conference (1993)
Keogh, E.J., Pazzani, M.J.: An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. In: KDD (1998)
Rakthanmanon, T., Keogh, E.J., Lonardi, S., Evans, S.: Time series epenthesis: clustering time series streams requires ignoring some data. In: ICDE (2011)
Warren, T.W.: Clustering of time series data—a survey. Pattern Recognit. 38(11), 1857–1874 (2005)
Article MATH Google Scholar
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 1–58 (2009)
Article Google Scholar
Wang, X., Mueen, A., Ding, H., Trajcevski, G., Scheuermann, P., Keogh, E.J.: Experimental comparison of representation methods and distance measures for time series data. DMKD 26(2), 275–309 (2013)
MathSciNet Google Scholar
Chen, L., Özsu, M.T., Oria, V.: Robust and fast similarity search for moving object trajectories. In: SIGMOD (2005)
Vlachos, M., Gunopulos, D., Kollios, G.: Discovering similar multidimensional trajectories. In: ICDE (2002)
Palpanas, T., Vlachos, M., Keogh, E.J., Gunopulos, D.: Streaming time series summarization using user-defined amnesic functions. TKDE 20(7), 992–1006 (2008)
Google Scholar
Palpanas, T., Vlachos, M., Keogh, E.J., Gunopulos, D., Truppel, W.: Online amnesic approximation of streaming time series. In: ICDE, pp. 339–349 (2004)
Chan, K.P., Fu, A.C.: Efficient time series matching by wavelets. In: ICDE (1999)
Keogh, E., Chakrabarti, K., Pazzani, M.: Dimensionality reduction for fast similarity search in large time series databases. KAIS 3(3), 263–286 (2000)
MATH Google Scholar
Yi, B., Faloutsos, C.: Fast time sequence indexing for arbitrary lp norms. In: VLDB (2000)
Lin, J., Keogh, E., Lonardi, S.: A symbolic representation of time series, with implications for streaming algorithms. In: DMKD, pp. 2–11 (2003)
Assent, I., Krieger, R., Afschari, F., Seidl, T.: The TS-tree: efficient time series search and retrieval. In: EDBT (2008)
Shieh, J., Keogh, E.: iSAX: indexing and mining terabyte sized time series. In: KDD (2008)
Shieh, J., Keogh, E.: iSAX: disk-aware mining and indexing of massive time series datasets. DMKD 19(1), 24–57 (2009)
MathSciNet Google Scholar
Graefe, G., Halim, F., Idreos, S., Kuno, H.A., Manegold, S.: Concurrency control for adaptive indexing. PVLDB 5(7), 656–667 (2012)
Google Scholar
Graefe, G., Halim, F., Idreos, S., Kuno, H.A., Manegold, S., Seeger, B.: Transactional support for adaptive indexing. VLDB J. 23(2), 303–328 (2014)
Article Google Scholar
Halim, F., Idreos, S., Karras, P., Yap, R.H.C.: Stochastic database cracking: towards robust adaptive indexing in main-memory column-stores. PVLDB 5(6), 502–513 (2012)
Google Scholar
Idreos, S., Kersten, M.L., Manegold, S.: Updating a cracked database. In: SIGMOD, pp. 413–424 (2007)
Idreos, S., Kersten, M.L., Manegold, S.: Database cracking. In: CIDR (2007)
Idreos, S., Kersten, M.L., Manegold, S.: Self-organizing tuple reconstruction in column-stores. In: SIGMOD (2009)
Idreos, S., Manegold, S., Kuno, H.A., Graefe, G.: Merging what’s cracked, cracking what’s merged: adaptive indexing in main-memory column-stores. PVLDB 4(9), 585–597 (2011)
Google Scholar
Schuhknecht, F.M., Jindal, A., Dittrich, J.: The uncracked pieces in database cracking. PVLDB 7(2), 97–108 (2013)
Google Scholar
Richter, S., Quiane-Ruiz, J.-A., Schuh, S., Dittrich, J.: Towards zero-overhead static and adaptive indexing in hadoop. VLDBJ 23(3), 469–494 (2013)
Zhou, J., Ross, K.A.: Buffering accesses to memory-resident index structures. In: VLDB (2003)
Zhou, J., Ross, K.A., Buffering database operations for enhanced instruction cache performance. In: SIGMOD (2004)
Stonebraker, M.: The case for partial indexes. SIGMOD Rec. 18(4), 4–11 (1989)
Article Google Scholar
Achakeev, D., Seeger, B.: Efficient bulk updates on multiversion b-trees. PVLDB 6(14), 1834–1845 (2013)
Google Scholar
Ghanem, T.M., Shah, R., Mokbel, M.F., Aref, W.G., Vitter, J.S.: Bulk operations for space-partitioning trees. In: ICDE (2004)
Zoumpatianos, K., Lou, Y., Palpanas, T., Gehrke, J.: Query workloads for data series indexes. In: KDD (2015)
Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: SIGMOD (1994)
Rafiei, D., Mendelzon, A.: Similarity-based queries for time series data. In: SIGMOD, pp. 13–25 (1997)
Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. TPAMI 33(1), 117–128 (2011)
Article Google Scholar
Camerra, A., Shieh, J., Palpanas, T., Rakthanmanon, T., Keogh, E.: Beyond one billion time series: indexing and mining very large time series collections with iSAX2+. KAIS 39(1), 123–151 (2014)
Incorporated Research Institutions for Seismology—Seismic Data Access. http://ds.iris.edu/data/access/ (2016)
Soldi, S., Beckmann, V., Baumgartner, W., Ponti, G., Shrader, C.R., Lubiński, P., Krimm, H., Mattana, F., Tueller, J.: Long-term variability of agn at hard X-rays. Astron. Astrophys. 563, A57 (2014)
Article Google Scholar
Kashyap, S., Karras, P.: Scalable kNN search on vertically stored time series. In: KDD (2011)
Palpanas, T.: Data series management: the road to big sequence analytics. SIGMOD Rec. 44(2), 47–52 (2015)
Article Google Scholar
Zoumpatianos, K., Idreos, S., Palpanas, T.: RINSE: interactive data series exploration with ADS+. PVLDB 8(12), 1912–1923 (2015)
Google Scholar
du Mouza, C., Litwin, W., Rigaux, P.: SD-Rtree: a scalable distributed rtree. In: ICDE (2007)
Wang, J., Wu, S., Gao, H., Li, J., Ooi, B.C,: Indexing multi-dimensional data in a cloud system. In: SIGMOD (2010)
Xie, Y., Palsetia, D., Trajcevski, G., Agrawal, A., Choudhary, A.N.: SILVERBACK: scalable association mining for temporal data in columnar probabilistic databases. In: ICDE (2014)

Download references

Acknowledgments

We would like to thank Prof. Volker Beckmann for providing us the Astro data set [60].

Author information

Authors and Affiliations

University of Trento, Trento, TN, Italy
Kostas Zoumpatianos
Harvard University, Cambridge, MA, USA
Stratos Idreos
Paris Descartes University, Paris, France
Themis Palpanas

Authors

Kostas Zoumpatianos
View author publications
You can also search for this author in PubMed Google Scholar
Stratos Idreos
View author publications
You can also search for this author in PubMed Google Scholar
Themis Palpanas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kostas Zoumpatianos.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zoumpatianos, K., Idreos, S. & Palpanas, T. ADS: the adaptive data series index. The VLDB Journal 25, 843–866 (2016). https://doi.org/10.1007/s00778-016-0442-5

Download citation

Received: 01 December 2015
Revised: 20 August 2016
Accepted: 23 August 2016
Published: 31 August 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s00778-016-0442-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ADS: the adaptive data series index

Abstract

Access this article

Similar content being viewed by others

Evolution of a Data Series Index

Generating data series query workloads

Scalable data series subsequence matching with ULISSE

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

ADS: the adaptive data series index

Abstract

Access this article

Similar content being viewed by others

Evolution of a Data Series Index

Generating data series query workloads

Scalable data series subsequence matching with ULISSE

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation