Skip to main content

Self-organizing Data Processing for Time Series Using SPARK

  • Conference paper
  • First Online:
Mobile Computing and Sustainable Informatics

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 68))

  • 999 Accesses

Abstract

Time series data can be found at many places in our day-to-day life. As the world is moving toward Internet of things (IoT) and sensors, data is obtained in enormous amounts. These data from multiple sources need to be combined as a single unit in order to make inferences or predictions. As the velocity of data generation increases, big data perspective needs to be applied to time series. Big data time series representation should be flexible enough to accommodate different time series. With multiple time series collected from different sources and at different intervals, it poses a challenge to combine these time series into a single one to perform analysis. In this paper, we propose a big data approach to represent time series that provides solution to these challenges and helps in analysis of the data. The proposed approach provides an efficient way to combine multiple time series and perform analysis on it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 219.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 279.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. D. Mourtzis, E. Vlachou, N. Milas, Industrial big data as a result of IoT adoption in manufacturing. Procedia CIRP 55, 290–295 (2016). ISSN 2212-8271

    Google Scholar 

  2. Spark.com: Apache Spark unified analytics engine for large scale data processing. https://spark.apache.org/docs/latest/. Accessed December 15, 2020

  3. J. Dean, S. Ghemawat, Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  4. S. Hagedorn, P. Gotze, K.-U. Sattler, The STARK framework for spatiotemporal data analytics on SPARK, Datenbanksysteme für Business, Technologie und Web (BTW 2017), pp 123–142

    Google Scholar 

  5. S. Hagedorn, T. Rath, Efficient spatio-temporal event processing with STARK. In 20th International Conference on Extending Database Technology (EDBT), 21–24 March 2017, pp 570–573

    Google Scholar 

  6. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M.J. Franklin, S. Shenker, I. Stoica, Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing, in Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, (USENIX Association, 2012), p. 2

    Google Scholar 

  7. M. Tahmassebpour, A new method for time-series big data effective storage. IEEE Access 5, 10694–10699 (2017)

    Article  Google Scholar 

  8. Cloudera Blog on Spark Timeseries Library. https://blog.cloudera.com/blog/2015/12/spark-ts-a-new-library-for-analyzing-time-series-data-with-apache-spark/

  9. Time series for Spark—A library for analyzes of large scale time series datasets. https://sryza.github.io/spark-timeseries/0.3.0/index.html

  10. Huohua—Framework for Distributed Time Series. https://databricks.com/session/huohua-a-distributed-time-series-analysis-framework-for-spark

  11. Flint—Library for time series. https://www.twosigma.com/insights/article/introducing-flint-a-time-series-library-for-apache-spark/.

  12. Flint—Library for time series. https://ts-flint.readthedocs.io/en/latest/

  13. K. Shvachko, H. Kuang, S. Radia, R. Chansler, The hadoop distributed file system, in 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). (IEEE, 2010), pp. 1–10

    Google Scholar 

  14. H. Tang, A. Gulbeden, J. Zhou, W. Strathearn, T. Yang, L. Chu, A self-organizing storage cluster for parallel data-intensive applications, in SC ‘04: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, Pittsburgh, PA, USA, 2004, pp. 52–52. https://doi.org/10.1109/SC.2004.9

  15. Climate data retrieved from https://www.timeanddate.com/weather/india/mumbai

  16. Databricks Platform for execution. https://databricks.com/

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bharambe, A., Kalbande, D. (2022). Self-organizing Data Processing for Time Series Using SPARK. In: Shakya, S., Bestak, R., Palanisamy, R., Kamel, K.A. (eds) Mobile Computing and Sustainable Informatics. Lecture Notes on Data Engineering and Communications Technologies, vol 68. Springer, Singapore. https://doi.org/10.1007/978-981-16-1866-6_17

Download citation

Publish with us

Policies and ethics