Advertisement

Searching for Patterns in Sequential Data: Functionality and Performance Assessment of Commercial and Open-Source Systems

  • Witold Andrzejewski
  • Bartosz Bębel
  • Szymon Kłosowski
  • Bartosz Łukaszewski
  • Robert Wrembel
  • Gastón BakkalianEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9975)

Abstract

Ubiquitous devices and applications generate data that are naturally ordered by time. Thus elementary data items can form sequences. The most popular way of analyzing sequences is searching for patterns. To this end, sequential pattern discovery techniques were proposed in some research contributions and implemented in a few database systems, e.g., Oracle Database, Teradata Aster, Apache Hive. The goal of this work is to assess the functionality of the systems and to evaluate their performance with respect to pattern queries.

Keywords

Query Language Pattern Query Complex Event Processing Travel Length Query Execution Time 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgement

The research of G. Bakkalian has been funded by the European Commission through the “Erasmus Mundus Joint Doctorate Information Technologies for Business Intelligence Doctoral College (IT4BI-DC)”. The research of W. Andrzejewski, B. Bębel, and R. Wrembel has been funded by the Polish National Science Center, grant “Analytical processing and mining of sequential data: models, algorithms, and data structures”.

References

  1. 1.
    Bebel, B., Cichowicz, T., Morzy, T., Rytwiński, F., Wrembel, R., Koncilia, C.: Sequential data analytics by means of Seq-SQL language. In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds.) DEXA 2015. LNCS, vol. 9261, pp. 416–431. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-22849-5_28 CrossRefGoogle Scholar
  2. 2.
    Bębel, B., Morzy, M., Morzy, T., Królikowski, Z., Wrembel, R.: OLAP-like analysis of time point-based sequential data. In: Castano, S., Vassiliadis, P., Lakshmanan, L.V., Lee, M.L. (eds.) ER 2012. LNCS, vol. 7518, pp. 153–161. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33999-8_19 CrossRefGoogle Scholar
  3. 3.
    Buchmann, A.P., Koldehofe, B.: Complex event processing. Inf. Technol. 51(5), 241–242 (2009)Google Scholar
  4. 4.
    Chawathe, S.S., Krishnamurthy, V., Ramachandran, S., Sarma, S.: Managing RFID data. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 1189–1195 (2004)Google Scholar
  5. 5.
    Chui, C.K., Kao, B., Lo, E., Cheung, D.: S-OLAP: an OLAP system for analyzing sequence data. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 1131–1134 (2010)Google Scholar
  6. 6.
    Chui, C.K., Lo, E., Kao, B., Ho, W.-S.: Supporting ranking pattern-based aggregate queries in sequence data cubes. In: Proceedings of ACM Conference on Information and Knowledge Management (CIKM), pp. 997–1006 (2009)Google Scholar
  7. 7.
    Fred Zemke, F., Witkowski, A., Cherniak, M., Colby, L.: Pattern matching in sequences of rows, 2007. Accessed 2 Mar 2016. http://web.cs.ucla.edu/classes/fall15/cs240A/notes/temporal/row-pattern-recogniton-11.pdf
  8. 8.
    Gonzalez, H., Han, J., Li, X.: FlowCube: constructing RFID flowcubes for multi-dimensional analysis of commodity flows. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 834–845 (2006)Google Scholar
  9. 9.
    Han, J., Chen, Y., Dong, G., Pei, J., Wah, B.W., Wang, J., Cai, Y.D.: Stream Cube: an architecture for multi-dimensional analysis of data streams. Distrib. Parallel Databases 18(2), 173–197 (2005)CrossRefGoogle Scholar
  10. 10.
    Koncilia, C., Morzy, T., Wrembel, R., Eder, J.: Interval OLAP: analyzing interval data. In: Bellatreche, L., Mohania, M.K. (eds.) DaWaK 2014. LNCS, vol. 8646, pp. 233–244. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-10160-6_21 Google Scholar
  11. 11.
    Koncilia, C., Pichler, H., Wrembel, R.: A generic data warehouse architecture for analyzing workflow logs. In: Morzy, T., Valduriez, P., Bellatreche, L. (eds.) ADBIS 2015. LNCS, vol. 9282, pp. 106–119. Springer, Heidelberg (2015). doi: 10.1007/978-3-319-23135-8_8 CrossRefGoogle Scholar
  12. 12.
    Lerner, A., Shasha, D.: AQuery: query language for ordered data, optimization techniques, and experiments. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 345–356 (2003)Google Scholar
  13. 13.
    Liu, M., Rundensteiner, E., Greenfield, K., Gupta, C., Wang, S., Ari, I., Mehta, A.: E-Cube: multi-dimensional event sequence analysis using hierarchical pattern query sharing. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 889–900 (2011)Google Scholar
  14. 14.
    Liu, M., Rundensteiner, E.A.: Event sequence processing: new models and optimization techniques. In: Proceedings of SIGMOD PhD Workshop on Innovative Database Research (IDAR), pp. 7–12 (2010)Google Scholar
  15. 15.
    Lo, E., Kao, B., Ho, W.-S., Lee, S.D., Chui, C.K., Cheung, D.W.: OLAP on sequence data. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 649–660 (2008)Google Scholar
  16. 16.
    Meisen, P., Kenig, D., Meisen, T., Recchioni, M., Jeschke, S.: TidaQL: a query language enabling on-line analytical processing of time interval data. In: Proceedings of International Conference on Enterprise Information Systems (ICEIS), pp. 54–66 (2015)Google Scholar
  17. 17.
    Melton, J. (ed.): Working Draft Database Language SQL - Part 15: Row Pattern Recognition (SQL/RPR). ANSI INCITS DM32.2-2011-00005 (2011)Google Scholar
  18. 18.
    Ramakrishnan, R., Donjerkovic, D., Ranganathan, A., Beyer, K.S., Krishnaprasad, M.: SRQL: Sorted relational query language. In: Proceedings of International Conference on Scientific and Statistical Database Management (SSDBM), pp. 84–95 (1998)Google Scholar
  19. 19.
    Sadri, R., Zaniolo, C., Zarkesh, A.M., Adibi, J.: A sequential pattern query language for supporting instant data mining for e-services. In: Proceedings of International Conference on Very Large Data Bases (VLDB), pp. 653–656 (2001)Google Scholar
  20. 20.
    Seshadri, P., Livny, M., Ramakrishnan, R.: Sequence query processing. SIGMOD Rec. 23(2), 430–441 (1994)Google Scholar
  21. 21.
  22. 22.
    Aalst, W.M.P.: Process cubes: slicing, dicing, rolling up and drilling down event data for process mining. In: Song, M., Wynn, M.T., Liu, J. (eds.) AP-BPM 2013. LNBIP, vol. 159, pp. 1–22. Springer, Heidelberg (2013). doi: 10.1007/978-3-319-02922-1_1 CrossRefGoogle Scholar
  23. 23.
    Wu, E., Diao, Y., Rizvi, S.: High-performance complex event processing over streams. In: Procedings of ACM SIGMOD International Conference on Management of Data, pp. 407–418 (2006)Google Scholar
  24. 24.
    Zhang, Y., Kersten, M., Manegold, S.: SciQL: Array data processing inside an RDBMS. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 1049–1052 (2013)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Witold Andrzejewski
    • 1
  • Bartosz Bębel
    • 1
  • Szymon Kłosowski
    • 1
  • Bartosz Łukaszewski
    • 1
  • Robert Wrembel
    • 1
  • Gastón Bakkalian
    • 1
    Email author
  1. 1.Institute of Computing SciencePoznan University of TechnologyPoznańPoland

Personalised recommendations