Abstract
Time series data can be found at many places in our day-to-day life. As the world is moving toward Internet of things (IoT) and sensors, data is obtained in enormous amounts. These data from multiple sources need to be combined as a single unit in order to make inferences or predictions. As the velocity of data generation increases, big data perspective needs to be applied to time series. Big data time series representation should be flexible enough to accommodate different time series. With multiple time series collected from different sources and at different intervals, it poses a challenge to combine these time series into a single one to perform analysis. In this paper, we propose a big data approach to represent time series that provides solution to these challenges and helps in analysis of the data. The proposed approach provides an efficient way to combine multiple time series and perform analysis on it.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
D. Mourtzis, E. Vlachou, N. Milas, Industrial big data as a result of IoT adoption in manufacturing. Procedia CIRP 55, 290–295 (2016). ISSN 2212-8271
Spark.com: Apache Spark unified analytics engine for large scale data processing. https://spark.apache.org/docs/latest/. Accessed December 15, 2020
J. Dean, S. Ghemawat, Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
S. Hagedorn, P. Gotze, K.-U. Sattler, The STARK framework for spatiotemporal data analytics on SPARK, Datenbanksysteme für Business, Technologie und Web (BTW 2017), pp 123–142
S. Hagedorn, T. Rath, Efficient spatio-temporal event processing with STARK. In 20th International Conference on Extending Database Technology (EDBT), 21–24 March 2017, pp 570–573
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M.J. Franklin, S. Shenker, I. Stoica, Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing, in Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, (USENIX Association, 2012), p. 2
M. Tahmassebpour, A new method for time-series big data effective storage. IEEE Access 5, 10694–10699 (2017)
Cloudera Blog on Spark Timeseries Library. https://blog.cloudera.com/blog/2015/12/spark-ts-a-new-library-for-analyzing-time-series-data-with-apache-spark/
Time series for Spark—A library for analyzes of large scale time series datasets. https://sryza.github.io/spark-timeseries/0.3.0/index.html
Huohua—Framework for Distributed Time Series. https://databricks.com/session/huohua-a-distributed-time-series-analysis-framework-for-spark
Flint—Library for time series. https://www.twosigma.com/insights/article/introducing-flint-a-time-series-library-for-apache-spark/.
Flint—Library for time series. https://ts-flint.readthedocs.io/en/latest/
K. Shvachko, H. Kuang, S. Radia, R. Chansler, The hadoop distributed file system, in 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). (IEEE, 2010), pp. 1–10
H. Tang, A. Gulbeden, J. Zhou, W. Strathearn, T. Yang, L. Chu, A self-organizing storage cluster for parallel data-intensive applications, in SC ‘04: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, Pittsburgh, PA, USA, 2004, pp. 52–52. https://doi.org/10.1109/SC.2004.9
Climate data retrieved from https://www.timeanddate.com/weather/india/mumbai
Databricks Platform for execution. https://databricks.com/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Bharambe, A., Kalbande, D. (2022). Self-organizing Data Processing for Time Series Using SPARK. In: Shakya, S., Bestak, R., Palanisamy, R., Kamel, K.A. (eds) Mobile Computing and Sustainable Informatics. Lecture Notes on Data Engineering and Communications Technologies, vol 68. Springer, Singapore. https://doi.org/10.1007/978-981-16-1866-6_17
Download citation
DOI: https://doi.org/10.1007/978-981-16-1866-6_17
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1865-9
Online ISBN: 978-981-16-1866-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)