Skip to main content

Optimization of Row Pattern Matching over Sequence Data in Spark SQL

  • Conference paper
  • First Online:
Database and Expert Systems Applications (DEXA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11706))

Included in the following conference series:

Abstract

Due to the advance of information and communications technology and sensor technology, a large quantity of sequence data (time series data, log data, etc.) are generated and processed every day. Row pattern matching for the sequence data stored in relational databases was standardized as SQL/RPR in 2016. Today, in addition to relational databases, there are many frameworks for processing a large amount of data in parallel and distributed computing environments. They include MapReduce and Spark. Hive and Spark SQL enable us to code data analysis processes in SQL-like query languages. Row pattern matching is also beneficial in Hive and Spark SQL. However, computational cost of the row pattern matching process is large and it is needed to make this process efficient. In this paper, we propose two optimization methods to realize the reduction of computational cost for row pattern matching process. We focus on Spark and show design and implementation of the proposed methods for Spark SQL. We verify by the experiments that our optimization methods really contribute to the reduction of the processing time of Spark SQL queries including row pattern matching.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. 19075-5:2016(E), I.T.: Information technology - database languages - sql technical reports - part 5: row pattern recognition in sql. technical report. Technical report, ISO copyright office (2016)

    Google Scholar 

  2. Agrawal, J., Diao, Y., Gyllstrom, D., Immerman, N.: Efficient pattern matching over event streams. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 147–160 (2008)

    Google Scholar 

  3. Armbrust, M., et al.: Spark SQL: relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1383–1394 (2015)

    Google Scholar 

  4. Cadonna, B., Gamper, J., Böhlen, M.H.: Efficient event pattern matching with match windows. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2012), pp. 471–479 (2012)

    Google Scholar 

  5. Demers, A., Gehrke, J., Panda, B., Riedewald, M., Sharma, V., White, W.: Cayuga: a general purpose event monitoring system. In: CIDR 2007, pp. 412–422 (2007)

    Google Scholar 

  6. Foundation, T.A.S.: Hadoop (2018). http://hadoop.apache.org/

  7. Foursquare: Foursquare (2018). https://foursquare.com

  8. Laker, K.: A technical deep dive into pattern matching using match\(\_\)recognize (2016). http://www.oracle.com/technetwork/database/bi-datawarehousing/mr-deep-dive-3769287.pdf

  9. Mei, Y., Madden, S.: ZStream: a cost-based query processor for adaptively detecting composite events. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 193–206 (2009)

    Google Scholar 

  10. Thusoo, A., et al.: Hive - a petabyte scale data warehouse using Hadoop. In: Proceedings of the 26th International Conference on Data Engineering (ICDE2010) (2010)

    Google Scholar 

  11. Wu, E., Diao, Y., Rizvi, S.: High-performance complex event processing over streams. In: SIGMOD 2006, pp. 407–418 (2006)

    Google Scholar 

  12. Yang, D., Zhang, D., Chen, L., Qu, B.: NationTelescope: monitoring and visualizing large-scale collective behavior in LBSNs. J. Netw. Comput. Appl. (JNCA) 55, 170–180 (2015)

    Article  Google Scholar 

  13. Yang, D., Zhang, D., Qu, B.: Participatory cultural mapping based on collective behavior data in location based social networks. In: ACM Trans. on Intelligent Systems and Technology (TIST) (2015)

    Google Scholar 

  14. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stonica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud2010), vol. 55, p. 10 (2010)

    Google Scholar 

Download references

Acknowledgement

This work was partly supported by Grant-in-Aid for Scientific Research (B) (#19H04114) from JSPS.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Kosuke Nakabasami , Hiroyuki Kitagawa or Yuya Nasu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nakabasami, K., Kitagawa, H., Nasu, Y. (2019). Optimization of Row Pattern Matching over Sequence Data in Spark SQL. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2019. Lecture Notes in Computer Science(), vol 11706. Springer, Cham. https://doi.org/10.1007/978-3-030-27615-7_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27615-7_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27614-0

  • Online ISBN: 978-3-030-27615-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics