Benchmarking and Performance Analysis of Event Sequence Queries on Relational Database

Hayamizu, Yuto; Kawamichi, Ryoji; Goda, Kazuo; Kitsuregawa, Masaru

doi:10.1007/978-3-030-11404-6_9

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 11135))

Included in the following conference series:

Technology Conference on Performance Evaluation and Benchmarking

835 Accesses
1 Citations

Abstract

The relational database has been the fundamental technology for data-driven decision making based on the histories of event occurrences about the analysis target. Thus the performance of analytical workloads in relational databases has been studied intensively. As a common language for performance analysis, decision support benchmarks such as TPC-H have been widely used. These benchmarks focus on summarization of the event occurrence information. Individual event occurrences or inter-occurrence associations are rarely examined in these benchmarks. However, this type of query, called an event sequence query in this paper, is becoming important in various real-world applications. Typically, an event sequence query extracts event sequences starting from a small number of interesting event occurrences. In a relational database, these queries are described by multiple self-joins on the whole sequence of events. Furthermore, each pair of events to be joined tends to have a strong correlation in the timestamp attribute, resulting in heavily skewed join workloads. Despite the usefulness in real-world data analysis, very little work has been done on performance analysis of event sequence queries.

In this paper, we present the initial design of ESQUE benchmark, a benchmark for event sequence queries. We then give experimental results of the comparison of database system implementations: PostgreSQL v.s. MySQL, and the comparison of historical versions of PostgreSQL. Conducted performance analysis shows that ESQUE benchmark allows us to discover performance problems which had been overlooked in existing benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Queries 25 and 29 defined in TPC-DS [17] join multiple relations recording event occurrences, but these queries do not clearly consider semantic relations between individual event occurrences.
2.
Although it might be better to call it a relationship between event occurrences, we use the term connection to avoid confusion with “relation” or “relational” in this paper.
3.
shared_buffers(PostgreSQL) and innodb_buffer_pool_size(MySQL) were configured.
4.
PostgreSQL’s bitmap index scan is a search algorithm on B\(^+\)-tree and not an index data structure using bitmaps. In PostgreSQL implementation, bitmap index scan fetches only record pointers from B\(^{+}\)-tree indices, sorts record pointers by block address, and then fetches records from a table.
5.
When multiple B\(^+\)-tree indexes are available on the single table, PostgreSQL’s optimizer may choose a query execution plan using bitmap index scan on multiple B\(^+\)-tree indexes. We call it multi-index bitmap index scan. A multi-index bitmap scan first searches multiple B\(^+\)-tree indexes and compute the intersection of record pointers, and then fetches the records.
6.
We omitted TPC-H Q.4, Q.20, and Q.21 from the measurement of this experiment because these queries did not finish in 24 h for PostgreSQL 8.4 and 9.2.
7.
Parallel sequential scan was first introduced at PostgreSQL9.6, but it was not selected for any queries by query optimizer of PostgreSQL9.6 in this experiment.

References

Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, ICDE 1995, pp. 3–14. IEEE Computer Society, Washington, DC (1995)
Google Scholar
Alagiannis, I., Borovica, R., Branco, M., Idreos, S., Ailamaki, A.: NoDB: efficient query execution on raw data files. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD 2012, pp. 241–252. ACM, New York (2012)
Google Scholar
Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential PAttern mining using a bitmap representation. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 429–435. ACM, New York (2002)
Google Scholar
Boncz, P., Anatiotis, A.-C., Kläbe, S.: JCC-H: adding join crossing correlations with skew to TPC-H. In: Nambiar, R., Poess, M. (eds.) TPCTC 2017. LNCS, vol. 10661, pp. 103–119. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-72401-0_8
Chapter Google Scholar
Boncz, P., Neumann, T., Erling, O.: TPC-H analyzed: hidden messages and lessons learned from an influential benchmark. In: Nambiar, R., Poess, M. (eds.) TPCTC 2013. LNCS, vol. 8391, pp. 61–76. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-04936-6_5
Chapter Google Scholar
Chan, K.P., Fu, A.W.C.: Efficient time series matching by wavelets. In: Proceedings 15th International Conference on Data Engineering (Cat. No. 99CB36337), pp. 126–133, March 1999
Google Scholar
Chaudhuri, S.: An overview of query optimization in relational systems. In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS 1998, pp. 34–43. ACM, New York (1998)
Google Scholar
Dietterich, T.G., Michalski, R.S.: Discovering patterns in sequences of events. Artif. Intell. 25(2), 187–232 (1985)
Article Google Scholar
Hacigumus, H., Chi, Y., Wu, W., Zhu, S., Tatemura, J., Naughton, J.F.: Predicting query execution time: are optimizer cost models really unusable? In: Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013), ICDE 2013, pp. 1081–1092. IEEE Computer Society, Washington, DC (2013)
Google Scholar
Han, J., Dong, G., Yin, Y.: Efficient mining of partial periodic patterns in time series database. In: Proceedings 15th International Conference on Data Engineering (Cat. No. 99CB36337), pp. 106–115, March 1999
Google Scholar
Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.C.: FreeSpan: frequent pattern-projected sequential pattern mining. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2000, pp. 355–359. ACM, New York (2000)
Google Scholar
Keogh, E.: Exact indexing of dynamic time warping. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB 2002, pp. 406–417. VLDB Endowment (2002)
Google Scholar
Law, Y.N., Wang, H., Zaniolo, C.: Query languages and data models for database sequences and data streams. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30, VLDB 2004, pp. 492–503. VLDB Endowment (2004)
Google Scholar
Leis, V., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., Neumann, T.: How good are query optimizers, really? Proc. VLDB Endow. 9(3), 204–215 (2015)
Article Google Scholar
Moerkotte, G., Neumann, T., Steidl, G.: Preventing bad plans by bounding the impact of cardinality estimation errors. Proc. VLDB Endow. 2(1), 982–993 (2009)
Article Google Scholar
Moussa, R.: Big-SeqDB-Gen: a formal and scalable approach for parallel generation of big synthetic sequence databases. In: Nambiar, R., Poess, M. (eds.) TPCTC 2015. LNCS, vol. 9508, pp. 61–76. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31409-9_5
Chapter Google Scholar
Nambiar, R.O., Poess, M.: The making of TPC-DS. In: Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB 2006, pp. 1049–1058. VLDB Endowment (2006)
Google Scholar
O’Neil, P., O’Neil, E., Chen, X., Revilak, S.: The star schema benchmark and augmented fact table indexing. In: Nambiar, R., Poess, M. (eds.) TPCTC 2009. LNCS, vol. 5895, pp. 237–252. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10424-4_17
Chapter Google Scholar
O’Neil, P.E., O’Neil, E.J., Chen, X.: The star schema benchmark (SSB). Pat 200, 50 (2007)
Google Scholar
Poess, M., Floyd, C.: New TPC benchmarks for decision support and web commerce. SIGMOD Rec. 29(4), 64–71 (2000)
Article Google Scholar
Rafiei, D., Mendelzon, A.O.: Querying time series data based on similarity. IEEE Trans. Knowl. Data Eng. 12(5), 675–693 (2000)
Article Google Scholar
Ramakrsihnan, R., Donjerkovic, D., Ranganathan, A., Beyer, K.S., Krishnaprasad, M.: SRQL: sorted relational query language. In: Proceedings of Tenth International Conference on Scientific and Statistical Database Management (Cat. No. 98TB100243), pp. 84–95, July 1998
Google Scholar
Ray, S., Simion, B., Brown, A.D.: Jackpine: a benchmark to evaluate spatial database performance. In: 2011 IEEE 27th International Conference on Data Engineering, pp. 1139–1150, April 2011
Google Scholar
Reinsel, D., Gantz, J., Rydning, J.: Data Age 2025: The Evolution of Data to Life-Critical. Don’t Focus on Big Data (2017)
Google Scholar
Sadri, R., Zaniolo, C., Zarkesh, A., Adibi, J.: Optimization of sequence queries in database systems. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2001, pp. 71–81. ACM, New York (2001)
Google Scholar
Seshadri, P., Livny, M., Ramakrishnan, R.: SEQ: a model for sequence databases. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 232–239, March 1995
Google Scholar
Seshadri, P., Livny, M., Ramakrishnan, R.: Sequence query processing. In: Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, SIGMOD 1994, pp. 430–441. ACM, New York (1994)
Google Scholar
Snodgrass, R.: The TSQL2 Temporal Query Language. The Springer International Series in Engineering and Computer Science. Springer, New York (2012). https://doi.org/10.1007/978-1-4615-2289-8
Book MATH Google Scholar
Srivastava, J., Cooley, R., Deshpande, M., Tan, P.N.: Web usage mining: discovery and applications of usage patterns from web data. SIGKDD Explor. Newsl. 1(2), 12–23 (2000)
Article Google Scholar
Yi, B.K., Jagadish, H.V., Faloutsos, C.: Efficient retrieval of similar time sequences under time warping. In: Proceedings 14th International Conference on Data Engineering, pp. 201–208, February 1998
Google Scholar

Download references

Acknowledgment

This paper is in part based on results obtained from a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO).

Author information

Authors and Affiliations

Institute of Industrial Science, The University of Tokyo, Tokyo, Japan
Yuto Hayamizu, Ryoji Kawamichi, Kazuo Goda & Masaru Kitsuregawa
National Institute of Informatics, Tokyo, Japan
Masaru Kitsuregawa

Authors

Yuto Hayamizu
View author publications
You can also search for this author in PubMed Google Scholar
Ryoji Kawamichi
View author publications
You can also search for this author in PubMed Google Scholar
Kazuo Goda
View author publications
You can also search for this author in PubMed Google Scholar
Masaru Kitsuregawa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuto Hayamizu .

Editor information

Editors and Affiliations

Advanced Micro Systems, Inc., Santa Clara, CA, USA
Raghunath Nambiar
Oracle Corporation, Redwood Shores, CA, USA
Meikel Poess

Appendix. SQL Queries in ESQUE Benchmark

1.1 ESQ.1

Default values: [MONTHS] = 3, [CUSTKEY] = 10000, [DATE] = 1994-1-1

1.2 ESQ.2

Default values: [MONTHS] = 3, [CUSTKEY] = 150, [DATE] = 1994-1-1

1.3 ESQ.3

Default values: [MONTHS] = 6, [CUSTKEY] = 1000, [DATE] = 1994-1-1, [BRAND1] = Brand#11, [BRAND2] = Brand#21, [BRAND3] = Brand#31

1.4 ESQ.4

Default values: [MONTHS] = 1, [CUSTKEY] = 10000, [DATE] = 1994-1-1

1.5 ESQ.5

Default values: [MONTHS] = 1, [CUSTKEY] = 25000, [DATE] = 1994-1-1

1.6 ESQ.6

Default values: [MONTHS] = 1, [CUSTKEY] = 500, [DATE] = 1994-1-1

1.7 ESQ.7

Default values: [MONTHS] = 1, [CUSTKEY] = 6000, [DATE] = 1994-1-1

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hayamizu, Y., Kawamichi, R., Goda, K., Kitsuregawa, M. (2019). Benchmarking and Performance Analysis of Event Sequence Queries on Relational Database. In: Nambiar, R., Poess, M. (eds) Performance Evaluation and Benchmarking for the Era of Artificial Intelligence. TPCTC 2018. Lecture Notes in Computer Science(), vol 11135. Springer, Cham. https://doi.org/10.1007/978-3-030-11404-6_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-11404-6_9
Published: 30 January 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11403-9
Online ISBN: 978-3-030-11404-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Benchmarking and Performance Analysis of Event Sequence Queries on Relational Database

Abstract

Access this chapter

Notes

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix. SQL Queries in ESQUE Benchmark

Appendix. SQL Queries in ESQUE Benchmark

1.1 ESQ.1

1.2 ESQ.2

1.3 ESQ.3

1.4 ESQ.4

1.5 ESQ.5

1.6 ESQ.6

1.7 ESQ.7

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation