Modeling Large Time Series for Efficient Approximate Query Processing

Perera, Kasun S.; Hahmann, Martin; Lehner, Wolfgang; Pedersen, Torben Bach; Thomsen, Christian

doi:10.1007/978-3-319-22324-7_16

Kasun S. Perera¹⁸,
Martin Hahmann¹⁸,
Wolfgang Lehner¹⁸,
Torben Bach Pedersen¹⁹ &
…
Christian Thomsen¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9052))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1273 Accesses
4 Citations

Abstract

Evolving customer requirements and increasing competition force business organizations to store increasing amounts of data and query them for information at any given time. Due to the current growth of data volumes, timely extraction of relevant information becomes more and more difficult with traditional methods. In addition, contemporary Decision Support Systems (DSS) favor faster approximations over slower exact results. Generally speaking, processes that require exchange of data become inefficient when connection bandwidth does not increase as fast as the volume of data. In order to tackle these issues, compression techniques have been introduced in many areas of data processing. In this paper, we outline a new system that does not query complete datasets but instead utilizes models to extract the requested information. For time series data we use Fourier and Cosine transformations and piece-wise aggregation to derive the models. These models are initially created from the original data and are kept in the database along with it. Subsequent queries are answered using the stored models rather than scanning and processing the original datasets. In order to support model query processing, we maintain query statistics derived from experiments and when running the system. Our approach can also reduce communication load by exchanging models instead of data. To allow seamless integration of model-based querying into traditional data warehouses, we introduce a SQL compatible query terminology. Our experiments show that querying models is up to 80 % faster than querying over the raw data while retaining a high accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.nrel.gov/gis/data_wind.html

References

Agarwal, S., Mozafari, B., Panda, A., Milner, H., Madden, S., Stoica, I.: Blinkdb: queries with bounded errors and bounded response times on very large data. In: Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys 2013, pp. 29–42. ACM (2013)
Google Scholar
Chakrabarti, K., Garofalakis, M.N., Rastogi, R., Shim, K.: Approximate query processing using wavelets. In: Proceedings of the 26th International Conference on Very Large Data Bases, VLDB 2000, pp. 111–122. Morgan Kaufmann Publishers Inc. (2000)
Google Scholar
Chaovalit, P., Gangopadhyay, A., Karabatis, G., Chen, Z.: Discrete wavelet transform-based time series analysis and mining. ACM Comput. Surv. 43(2), 6:1–6:37 (2011)
Article Google Scholar
Deshpande, A., Guestrin, C., Madden, S.R., Hellerstein, J.M., Hong, W.: Model-driven data acquisition in sensor networks. In: Proceedings of the 30th International Conference on Very Large Data Bases - Volume 30, VLDB 2004, pp. 588–599. VLDB Endowment (2004)
Google Scholar
Deshpande, A., Madden, S.: Mauvedb: supporting model-based user views in database systems. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, SIGMOD 2006, pp. 73–84. ACM (2006)
Google Scholar
Khalefa, M.E., Fischer, U., Pedersen, T.B., Lehner, W.: Model-based integration of past and future in timetravel. Proc. VLDB Endow. 5(12), 1974–1977 (2012)
Article Google Scholar
Khurana, U., Parthasarathy, S., Turaga, D.S.: FAQ: a framework for fast approximate query processing on temporal data. In: Proceedings of the 3rd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, BigMine 2014, 24 August 2014, pp. 29–45 (2014)
Google Scholar
Perng, C.-S., Wang, H., Zhang, S., Parker, D.: Landmarks: a new model for similarity-based pattern querying in time series databases. In: 2000 Proceedings of 16th International Conference on Data Engineering, pp. 33–42 (2000)
Google Scholar
Reeves, G., Liu, J., Nath, S., Zhao, F.: Managing massive time series streams with multi-scale compressed trickles. Proc. VLDB Endow. 2(1), 97–108 (2009)
Article Google Scholar
Spiegel, S., Schultz, D., Albayrak, S.: BestTime: finding representatives in time series datasets. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014, Part III. LNCS, vol. 8726, pp. 477–480. Springer, Heidelberg (2014)
Google Scholar

Download references

Acknowledgment

This research has been funded by the European Commission through the Erasmus Mundus Joint Doctorate, Information Technologies for Business Intelligence - Doctoral College (IT4BI-DC).

Author information

Authors and Affiliations

Database Technology Group, Technische Universität Dresden, Dresden, Germany
Kasun S. Perera, Martin Hahmann & Wolfgang Lehner
Department of Computer Science, Aalborg University, Aalborg, Denmark
Torben Bach Pedersen & Christian Thomsen

Authors

Kasun S. Perera
View author publications
You can also search for this author in PubMed Google Scholar
Martin Hahmann
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Lehner
View author publications
You can also search for this author in PubMed Google Scholar
Torben Bach Pedersen
View author publications
You can also search for this author in PubMed Google Scholar
Christian Thomsen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kasun S. Perera .

Editor information

Editors and Affiliations

Soochow University, Suzhou, China
An Liu
Nagoya University, Nagoya, Japan
Yoshiharu Ishikawa
Wuhan University, Wuhan, China
Tieyun Qian
University of Hong Kong, Hong Kong, China
Sarana Nutanong
Monash University, Clayton, Victoria, Australia
Muhammad Aamir Cheema

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Perera, K.S., Hahmann, M., Lehner, W., Pedersen, T.B., Thomsen, C. (2015). Modeling Large Time Series for Efficient Approximate Query Processing. In: Liu, A., Ishikawa, Y., Qian, T., Nutanong, S., Cheema, M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science(), vol 9052. Springer, Cham. https://doi.org/10.1007/978-3-319-22324-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-22324-7_16
Published: 30 July 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22323-0
Online ISBN: 978-3-319-22324-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics