Skip to main content

Benchmarking and Performance Analysis of Event Sequence Queries on Relational Database

  • Conference paper
  • First Online:
Performance Evaluation and Benchmarking for the Era of Artificial Intelligence (TPCTC 2018)

Abstract

The relational database has been the fundamental technology for data-driven decision making based on the histories of event occurrences about the analysis target. Thus the performance of analytical workloads in relational databases has been studied intensively. As a common language for performance analysis, decision support benchmarks such as TPC-H have been widely used. These benchmarks focus on summarization of the event occurrence information. Individual event occurrences or inter-occurrence associations are rarely examined in these benchmarks. However, this type of query, called an event sequence query in this paper, is becoming important in various real-world applications. Typically, an event sequence query extracts event sequences starting from a small number of interesting event occurrences. In a relational database, these queries are described by multiple self-joins on the whole sequence of events. Furthermore, each pair of events to be joined tends to have a strong correlation in the timestamp attribute, resulting in heavily skewed join workloads. Despite the usefulness in real-world data analysis, very little work has been done on performance analysis of event sequence queries.

In this paper, we present the initial design of ESQUE benchmark, a benchmark for event sequence queries. We then give experimental results of the comparison of database system implementations: PostgreSQL v.s. MySQL, and the comparison of historical versions of PostgreSQL. Conducted performance analysis shows that ESQUE benchmark allows us to discover performance problems which had been overlooked in existing benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Queries 25 and 29 defined in TPC-DS [17] join multiple relations recording event occurrences, but these queries do not clearly consider semantic relations between individual event occurrences.

  2. 2.

    Although it might be better to call it a relationship between event occurrences, we use the term connection to avoid confusion with “relation” or “relational” in this paper.

  3. 3.

    shared_buffers(PostgreSQL) and innodb_buffer_pool_size(MySQL) were configured.

  4. 4.

    PostgreSQL’s bitmap index scan is a search algorithm on B\(^+\)-tree and not an index data structure using bitmaps. In PostgreSQL implementation, bitmap index scan fetches only record pointers from B\(^{+}\)-tree indices, sorts record pointers by block address, and then fetches records from a table.

  5. 5.

    When multiple B\(^+\)-tree indexes are available on the single table, PostgreSQL’s optimizer may choose a query execution plan using bitmap index scan on multiple B\(^+\)-tree indexes. We call it multi-index bitmap index scan. A multi-index bitmap scan first searches multiple B\(^+\)-tree indexes and compute the intersection of record pointers, and then fetches the records.

  6. 6.

    We omitted TPC-H Q.4, Q.20, and Q.21 from the measurement of this experiment because these queries did not finish in 24 h for PostgreSQL 8.4 and 9.2.

  7. 7.

    Parallel sequential scan was first introduced at PostgreSQL9.6, but it was not selected for any queries by query optimizer of PostgreSQL9.6 in this experiment.

References

  1. Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, ICDE 1995, pp. 3–14. IEEE Computer Society, Washington, DC (1995)

    Google Scholar 

  2. Alagiannis, I., Borovica, R., Branco, M., Idreos, S., Ailamaki, A.: NoDB: efficient query execution on raw data files. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, SIGMOD 2012, pp. 241–252. ACM, New York (2012)

    Google Scholar 

  3. Ayres, J., Flannick, J., Gehrke, J., Yiu, T.: Sequential PAttern mining using a bitmap representation. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 429–435. ACM, New York (2002)

    Google Scholar 

  4. Boncz, P., Anatiotis, A.-C., Kläbe, S.: JCC-H: adding join crossing correlations with skew to TPC-H. In: Nambiar, R., Poess, M. (eds.) TPCTC 2017. LNCS, vol. 10661, pp. 103–119. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-72401-0_8

    Chapter  Google Scholar 

  5. Boncz, P., Neumann, T., Erling, O.: TPC-H analyzed: hidden messages and lessons learned from an influential benchmark. In: Nambiar, R., Poess, M. (eds.) TPCTC 2013. LNCS, vol. 8391, pp. 61–76. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-04936-6_5

    Chapter  Google Scholar 

  6. Chan, K.P., Fu, A.W.C.: Efficient time series matching by wavelets. In: Proceedings 15th International Conference on Data Engineering (Cat. No. 99CB36337), pp. 126–133, March 1999

    Google Scholar 

  7. Chaudhuri, S.: An overview of query optimization in relational systems. In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, PODS 1998, pp. 34–43. ACM, New York (1998)

    Google Scholar 

  8. Dietterich, T.G., Michalski, R.S.: Discovering patterns in sequences of events. Artif. Intell. 25(2), 187–232 (1985)

    Article  Google Scholar 

  9. Hacigumus, H., Chi, Y., Wu, W., Zhu, S., Tatemura, J., Naughton, J.F.: Predicting query execution time: are optimizer cost models really unusable? In: Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013), ICDE 2013, pp. 1081–1092. IEEE Computer Society, Washington, DC (2013)

    Google Scholar 

  10. Han, J., Dong, G., Yin, Y.: Efficient mining of partial periodic patterns in time series database. In: Proceedings 15th International Conference on Data Engineering (Cat. No. 99CB36337), pp. 106–115, March 1999

    Google Scholar 

  11. Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., Hsu, M.C.: FreeSpan: frequent pattern-projected sequential pattern mining. In: Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2000, pp. 355–359. ACM, New York (2000)

    Google Scholar 

  12. Keogh, E.: Exact indexing of dynamic time warping. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB 2002, pp. 406–417. VLDB Endowment (2002)

    Google Scholar 

  13. Law, Y.N., Wang, H., Zaniolo, C.: Query languages and data models for database sequences and data streams. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases - Volume 30, VLDB 2004, pp. 492–503. VLDB Endowment (2004)

    Google Scholar 

  14. Leis, V., Gubichev, A., Mirchev, A., Boncz, P., Kemper, A., Neumann, T.: How good are query optimizers, really? Proc. VLDB Endow. 9(3), 204–215 (2015)

    Article  Google Scholar 

  15. Moerkotte, G., Neumann, T., Steidl, G.: Preventing bad plans by bounding the impact of cardinality estimation errors. Proc. VLDB Endow. 2(1), 982–993 (2009)

    Article  Google Scholar 

  16. Moussa, R.: Big-SeqDB-Gen: a formal and scalable approach for parallel generation of big synthetic sequence databases. In: Nambiar, R., Poess, M. (eds.) TPCTC 2015. LNCS, vol. 9508, pp. 61–76. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-31409-9_5

    Chapter  Google Scholar 

  17. Nambiar, R.O., Poess, M.: The making of TPC-DS. In: Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB 2006, pp. 1049–1058. VLDB Endowment (2006)

    Google Scholar 

  18. O’Neil, P., O’Neil, E., Chen, X., Revilak, S.: The star schema benchmark and augmented fact table indexing. In: Nambiar, R., Poess, M. (eds.) TPCTC 2009. LNCS, vol. 5895, pp. 237–252. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-10424-4_17

    Chapter  Google Scholar 

  19. O’Neil, P.E., O’Neil, E.J., Chen, X.: The star schema benchmark (SSB). Pat 200, 50 (2007)

    Google Scholar 

  20. Poess, M., Floyd, C.: New TPC benchmarks for decision support and web commerce. SIGMOD Rec. 29(4), 64–71 (2000)

    Article  Google Scholar 

  21. Rafiei, D., Mendelzon, A.O.: Querying time series data based on similarity. IEEE Trans. Knowl. Data Eng. 12(5), 675–693 (2000)

    Article  Google Scholar 

  22. Ramakrsihnan, R., Donjerkovic, D., Ranganathan, A., Beyer, K.S., Krishnaprasad, M.: SRQL: sorted relational query language. In: Proceedings of Tenth International Conference on Scientific and Statistical Database Management (Cat. No. 98TB100243), pp. 84–95, July 1998

    Google Scholar 

  23. Ray, S., Simion, B., Brown, A.D.: Jackpine: a benchmark to evaluate spatial database performance. In: 2011 IEEE 27th International Conference on Data Engineering, pp. 1139–1150, April 2011

    Google Scholar 

  24. Reinsel, D., Gantz, J., Rydning, J.: Data Age 2025: The Evolution of Data to Life-Critical. Don’t Focus on Big Data (2017)

    Google Scholar 

  25. Sadri, R., Zaniolo, C., Zarkesh, A., Adibi, J.: Optimization of sequence queries in database systems. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2001, pp. 71–81. ACM, New York (2001)

    Google Scholar 

  26. Seshadri, P., Livny, M., Ramakrishnan, R.: SEQ: a model for sequence databases. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 232–239, March 1995

    Google Scholar 

  27. Seshadri, P., Livny, M., Ramakrishnan, R.: Sequence query processing. In: Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, SIGMOD 1994, pp. 430–441. ACM, New York (1994)

    Google Scholar 

  28. Snodgrass, R.: The TSQL2 Temporal Query Language. The Springer International Series in Engineering and Computer Science. Springer, New York (2012). https://doi.org/10.1007/978-1-4615-2289-8

    Book  MATH  Google Scholar 

  29. Srivastava, J., Cooley, R., Deshpande, M., Tan, P.N.: Web usage mining: discovery and applications of usage patterns from web data. SIGKDD Explor. Newsl. 1(2), 12–23 (2000)

    Article  Google Scholar 

  30. Yi, B.K., Jagadish, H.V., Faloutsos, C.: Efficient retrieval of similar time sequences under time warping. In: Proceedings 14th International Conference on Data Engineering, pp. 201–208, February 1998

    Google Scholar 

Download references

Acknowledgment

This paper is in part based on results obtained from a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuto Hayamizu .

Editor information

Editors and Affiliations

Appendix. SQL Queries in ESQUE Benchmark

Appendix. SQL Queries in ESQUE Benchmark

1.1 ESQ.1

figure a

Default values: [MONTHS] = 3, [CUSTKEY] = 10000, [DATE] = 1994-1-1

1.2 ESQ.2

figure b

Default values: [MONTHS] = 3, [CUSTKEY] = 150, [DATE] = 1994-1-1

1.3 ESQ.3

figure c

Default values: [MONTHS] = 6, [CUSTKEY] = 1000, [DATE] = 1994-1-1, [BRAND1] = Brand#11, [BRAND2] = Brand#21, [BRAND3] = Brand#31

1.4 ESQ.4

figure d

Default values: [MONTHS] = 1, [CUSTKEY] = 10000, [DATE] = 1994-1-1

1.5 ESQ.5

figure e

Default values: [MONTHS] = 1, [CUSTKEY] = 25000, [DATE] = 1994-1-1

1.6 ESQ.6

figure f

Default values: [MONTHS] = 1, [CUSTKEY] = 500, [DATE] = 1994-1-1

1.7 ESQ.7

figure g

Default values: [MONTHS] = 1, [CUSTKEY] = 6000, [DATE] = 1994-1-1

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Hayamizu, Y., Kawamichi, R., Goda, K., Kitsuregawa, M. (2019). Benchmarking and Performance Analysis of Event Sequence Queries on Relational Database. In: Nambiar, R., Poess, M. (eds) Performance Evaluation and Benchmarking for the Era of Artificial Intelligence. TPCTC 2018. Lecture Notes in Computer Science(), vol 11135. Springer, Cham. https://doi.org/10.1007/978-3-030-11404-6_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-11404-6_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-11403-9

  • Online ISBN: 978-3-030-11404-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics