The VLDB Journal

, Volume 25, Issue 6, pp 843–866 | Cite as

ADS: the adaptive data series index

  • Kostas Zoumpatianos
  • Stratos Idreos
  • Themis Palpanas
Regular Paper

Abstract

Numerous applications continuously produce big amounts of data series, and in several time critical scenarios analysts need to be able to query these data as soon as they become available. This, however, is not currently possible with the state-of-the-art indexing methods and for very large data series collections. In this paper, we present the first adaptive indexing mechanism, specifically tailored to solve the problem of indexing and querying very large data series collections. We present a detailed design and evaluation of our method using approximate and exact query algorithms with both synthetic and real data sets. Adaptive indexing significantly outperforms previous solutions, gracefully handling large data series collections, reducing the data to query delay: By the time state-of-the-art indexing techniques finish indexing 1 billion data series (and before answering even a single query), our method has already answered \(3*10^5\) queries.

References

  1. 1.
    Huijse, P., Estévez, P.A., Protopapas, P., Principe, J.C., Zegers, P.: Computational intelligence challenges and applications on large-scale astronomical time series databases. IEEE Comput. Intell. Mag. 9(3), 27–39 (2014)CrossRefGoogle Scholar
  2. 2.
    Kashino, K., Smith, G., Murase, H.: Time-series active search for quick retrieval of audio and video. In: ICASSP (1999)Google Scholar
  3. 3.
    Raza, U., Camerra, A., Murphy, A.L., Palpanas, T., Picco, G.P.: Practical data prediction for real-world wireless sensor networks. IEEE Trans. Knowl. Data Eng. 27(8), 2231–2244 (2015)Google Scholar
  4. 4.
    Shasha, D.: Tuning time series queries in finance: case studies and recommendations. IEEE Data Eng. Bull. 22(2), 40–46 (1999)Google Scholar
  5. 5.
    Ye, L., Keogh, E.J.: Time series shapelets: a new primitive for data mining. In: KDD (2009)Google Scholar
  6. 6.
    Bu, Y., Wing L.T., Chee F.A.W., Keogh, E., Pei, J., Meshkin, S.: Wat: finding top-k discords in time series database. In: SDM (2007)Google Scholar
  7. 7.
    Dallachiesa, M., Nushi, B., Mirylenka, K., Palpanas, T.: Uncertain time-series similarity: return to the basics. PVLDB 5(11), 1662–1673 (2012)Google Scholar
  8. 8.
    Dallachiesa, M., Palpanas, T., Ilyas, I.F.: Top-k nearest neighbor search in uncertain data series. PVLDB 8(1), 13–24 (2014)Google Scholar
  9. 9.
    Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., Keogh, E.: Searching and mining trillions of time series subsequences under dynamic time warping. In: KDD (2012)Google Scholar
  10. 10.
    Rodrigues, P., Gama, J., Pedroso, J.: Hierarchical clustering of time-series data streams. IEEE Trans. Knowl. Data Eng. 20(5), 615–627 (2008)CrossRefGoogle Scholar
  11. 11.
    Wang, Y., Wang, P., Pei, J., Wang, W., Huang, S.: A data-adaptive and dynamic segmentation index for whole matching on time series. PVLDB 6(10), 793–804 (2013)Google Scholar
  12. 12.
    Camerra, A., Palpanas, T., Shieh, J., Keogh, E.: iSAX 2.0: indexing and mining one billion time series. In: ICDM (2010)Google Scholar
  13. 13.
    QualiMaster a configurable real-time data processing infrastructure mastering autonomous quality adaptation—deliverable D1.1: initial use cases and requirements. Technical report, QualiMaster Project (2014)Google Scholar
  14. 14.
    Rogers, S.: Big data is scaling bi and analytics Information Management. http://www.information-management.com/issues/21_5/big-data-is-scaling-bi-and-analytics-10021093-1.html (2011). Accessed 28 Aug 2016
  15. 15.
    Adhd-200. http://fcon_1000.projects.nitrc.org/indi/adhd200/ (2011)Google Scholar
  16. 16.
    Sloan digital sky survey. https://www.sdss3.org/dr10/data_access/volume.php (2015)
  17. 17.
    Idreos, S., Alagiannis, I., Johnson, R., Ailamaki, A.: Here are my data files. Here are my queries. Where are my results? In: CIDR (2011)Google Scholar
  18. 18.
    Idreos, S., Liarou, E.: dbtouch: analytics at your fingertips. In: CIDR (2013)Google Scholar
  19. 19.
    Guttman, A.: R-trees a dynamic structure for spatial searching. In: SIGMOD (1984)Google Scholar
  20. 20.
    Berchtold, S., Keim, D.A., Kriegel, H.P.: The X-tree: an index structure for high-dimensional data. In: VLDB (1996)Google Scholar
  21. 21.
    Bentley, J.L.: Multidimensional binary search trees used for associative searching. Commun. ACM 18(9), 509–517 (1975)MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    Zoumpatianos, K., Idreos, S., Palpanas, T.: Indexing for interactive exploration of big data series. In: SIGMOD (2014)Google Scholar
  23. 23.
    Agrawal, R., Faloutsos, C., Swami, A.N.: Efficient similarity search in sequence databases. In: FODO Conference (1993)Google Scholar
  24. 24.
    Keogh, E.J., Pazzani, M.J.: An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. In: KDD (1998)Google Scholar
  25. 25.
    Rakthanmanon, T., Keogh, E.J., Lonardi, S., Evans, S.: Time series epenthesis: clustering time series streams requires ignoring some data. In: ICDE (2011)Google Scholar
  26. 26.
    Warren, T.W.: Clustering of time series data—a survey. Pattern Recognit. 38(11), 1857–1874 (2005)CrossRefMATHGoogle Scholar
  27. 27.
    Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3), 1–58 (2009)CrossRefGoogle Scholar
  28. 28.
    Wang, X., Mueen, A., Ding, H., Trajcevski, G., Scheuermann, P., Keogh, E.J.: Experimental comparison of representation methods and distance measures for time series data. DMKD 26(2), 275–309 (2013)MathSciNetGoogle Scholar
  29. 29.
    Chen, L., Özsu, M.T., Oria, V.: Robust and fast similarity search for moving object trajectories. In: SIGMOD (2005)Google Scholar
  30. 30.
    Vlachos, M., Gunopulos, D., Kollios, G.: Discovering similar multidimensional trajectories. In: ICDE (2002)Google Scholar
  31. 31.
    Palpanas, T., Vlachos, M., Keogh, E.J., Gunopulos, D.: Streaming time series summarization using user-defined amnesic functions. TKDE 20(7), 992–1006 (2008)Google Scholar
  32. 32.
    Palpanas, T., Vlachos, M., Keogh, E.J., Gunopulos, D., Truppel, W.: Online amnesic approximation of streaming time series. In: ICDE, pp. 339–349 (2004)Google Scholar
  33. 33.
    Chan, K.P., Fu, A.C.: Efficient time series matching by wavelets. In: ICDE (1999)Google Scholar
  34. 34.
    Keogh, E., Chakrabarti, K., Pazzani, M.: Dimensionality reduction for fast similarity search in large time series databases. KAIS 3(3), 263–286 (2000)MATHGoogle Scholar
  35. 35.
    Yi, B., Faloutsos, C.: Fast time sequence indexing for arbitrary lp norms. In: VLDB (2000)Google Scholar
  36. 36.
    Lin, J., Keogh, E., Lonardi, S.: A symbolic representation of time series, with implications for streaming algorithms. In: DMKD, pp. 2–11 (2003)Google Scholar
  37. 37.
    Assent, I., Krieger, R., Afschari, F., Seidl, T.: The TS-tree: efficient time series search and retrieval. In: EDBT (2008)Google Scholar
  38. 38.
    Shieh, J., Keogh, E.: iSAX: indexing and mining terabyte sized time series. In: KDD (2008)Google Scholar
  39. 39.
    Shieh, J., Keogh, E.: iSAX: disk-aware mining and indexing of massive time series datasets. DMKD 19(1), 24–57 (2009)MathSciNetGoogle Scholar
  40. 40.
    Graefe, G., Halim, F., Idreos, S., Kuno, H.A., Manegold, S.: Concurrency control for adaptive indexing. PVLDB 5(7), 656–667 (2012)Google Scholar
  41. 41.
    Graefe, G., Halim, F., Idreos, S., Kuno, H.A., Manegold, S., Seeger, B.: Transactional support for adaptive indexing. VLDB J. 23(2), 303–328 (2014)CrossRefGoogle Scholar
  42. 42.
    Halim, F., Idreos, S., Karras, P., Yap, R.H.C.: Stochastic database cracking: towards robust adaptive indexing in main-memory column-stores. PVLDB 5(6), 502–513 (2012)Google Scholar
  43. 43.
    Idreos, S., Kersten, M.L., Manegold, S.: Updating a cracked database. In: SIGMOD, pp. 413–424 (2007)Google Scholar
  44. 44.
    Idreos, S., Kersten, M.L., Manegold, S.: Database cracking. In: CIDR (2007)Google Scholar
  45. 45.
    Idreos, S., Kersten, M.L., Manegold, S.: Self-organizing tuple reconstruction in column-stores. In: SIGMOD (2009)Google Scholar
  46. 46.
    Idreos, S., Manegold, S., Kuno, H.A., Graefe, G.: Merging what’s cracked, cracking what’s merged: adaptive indexing in main-memory column-stores. PVLDB 4(9), 585–597 (2011)Google Scholar
  47. 47.
    Schuhknecht, F.M., Jindal, A., Dittrich, J.: The uncracked pieces in database cracking. PVLDB 7(2), 97–108 (2013)Google Scholar
  48. 48.
    Richter, S., Quiane-Ruiz, J.-A., Schuh, S., Dittrich, J.: Towards zero-overhead static and adaptive indexing in hadoop. VLDBJ 23(3), 469–494 (2013)Google Scholar
  49. 49.
    Zhou, J., Ross, K.A.: Buffering accesses to memory-resident index structures. In: VLDB (2003)Google Scholar
  50. 50.
    Zhou, J., Ross, K.A., Buffering database operations for enhanced instruction cache performance. In: SIGMOD (2004)Google Scholar
  51. 51.
    Stonebraker, M.: The case for partial indexes. SIGMOD Rec. 18(4), 4–11 (1989)CrossRefGoogle Scholar
  52. 52.
    Achakeev, D., Seeger, B.: Efficient bulk updates on multiversion b-trees. PVLDB 6(14), 1834–1845 (2013)Google Scholar
  53. 53.
    Ghanem, T.M., Shah, R., Mokbel, M.F., Aref, W.G., Vitter, J.S.: Bulk operations for space-partitioning trees. In: ICDE (2004)Google Scholar
  54. 54.
    Zoumpatianos, K., Lou, Y., Palpanas, T., Gehrke, J.: Query workloads for data series indexes. In: KDD (2015)Google Scholar
  55. 55.
    Faloutsos, C., Ranganathan, M., Manolopoulos, Y.: Fast subsequence matching in time-series databases. In: SIGMOD (1994)Google Scholar
  56. 56.
    Rafiei, D., Mendelzon, A.: Similarity-based queries for time series data. In: SIGMOD, pp. 13–25 (1997)Google Scholar
  57. 57.
    Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. TPAMI 33(1), 117–128 (2011)CrossRefGoogle Scholar
  58. 58.
    Camerra, A., Shieh, J., Palpanas, T., Rakthanmanon, T., Keogh, E.: Beyond one billion time series: indexing and mining very large time series collections with iSAX2+. KAIS 39(1), 123–151 (2014)Google Scholar
  59. 59.
    Incorporated Research Institutions for Seismology—Seismic Data Access. http://ds.iris.edu/data/access/ (2016)
  60. 60.
    Soldi, S., Beckmann, V., Baumgartner, W., Ponti, G., Shrader, C.R., Lubiński, P., Krimm, H., Mattana, F., Tueller, J.: Long-term variability of agn at hard X-rays. Astron. Astrophys. 563, A57 (2014)CrossRefGoogle Scholar
  61. 61.
    Kashyap, S., Karras, P.: Scalable kNN search on vertically stored time series. In: KDD (2011)Google Scholar
  62. 62.
    Palpanas, T.: Data series management: the road to big sequence analytics. SIGMOD Rec. 44(2), 47–52 (2015)CrossRefGoogle Scholar
  63. 63.
    Zoumpatianos, K., Idreos, S., Palpanas, T.: RINSE: interactive data series exploration with ADS+. PVLDB 8(12), 1912–1923 (2015)Google Scholar
  64. 64.
    du Mouza, C., Litwin, W., Rigaux, P.: SD-Rtree: a scalable distributed rtree. In: ICDE (2007)Google Scholar
  65. 65.
    Wang, J., Wu, S., Gao, H., Li, J., Ooi, B.C,: Indexing multi-dimensional data in a cloud system. In: SIGMOD (2010)Google Scholar
  66. 66.
    Xie, Y., Palsetia, D., Trajcevski, G., Agrawal, A., Choudhary, A.N.: SILVERBACK: scalable association mining for temporal data in columnar probabilistic databases. In: ICDE (2014)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  1. 1.University of TrentoTrentoItaly
  2. 2.Harvard UniversityCambridgeUSA
  3. 3.Paris Descartes UniversityParisFrance

Personalised recommendations