Self-organizing Data Processing for Time Series Using SPARK

Bharambe, Asha; Kalbande, Dhananjay

doi:10.1007/978-981-16-1866-6_17

Asha Bharambe⁶ &
Dhananjay Kalbande⁷

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 68))

999 Accesses

Abstract

Time series data can be found at many places in our day-to-day life. As the world is moving toward Internet of things (IoT) and sensors, data is obtained in enormous amounts. These data from multiple sources need to be combined as a single unit in order to make inferences or predictions. As the velocity of data generation increases, big data perspective needs to be applied to time series. Big data time series representation should be flexible enough to accommodate different time series. With multiple time series collected from different sources and at different intervals, it poses a challenge to combine these time series into a single one to perform analysis. In this paper, we propose a big data approach to represent time series that provides solution to these challenges and helps in analysis of the data. The proposed approach provides an efficient way to combine multiple time series and perform analysis on it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Cluster-Based Forecasting for Intermittent and Non-intermittent Time Series

A Comparison of Multivariate Time Series Clustering Methods

REFII Model as a Base for Data Mining Techniques Hybridization with Purpose of Time Series Pattern Recognition

References

D. Mourtzis, E. Vlachou, N. Milas, Industrial big data as a result of IoT adoption in manufacturing. Procedia CIRP 55, 290–295 (2016). ISSN 2212-8271
Google Scholar
Spark.com: Apache Spark unified analytics engine for large scale data processing. https://spark.apache.org/docs/latest/. Accessed December 15, 2020
J. Dean, S. Ghemawat, Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
S. Hagedorn, P. Gotze, K.-U. Sattler, The STARK framework for spatiotemporal data analytics on SPARK, Datenbanksysteme für Business, Technologie und Web (BTW 2017), pp 123–142
Google Scholar
S. Hagedorn, T. Rath, Efficient spatio-temporal event processing with STARK. In 20th International Conference on Extending Database Technology (EDBT), 21–24 March 2017, pp 570–573
Google Scholar
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M.J. Franklin, S. Shenker, I. Stoica, Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing, in Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, (USENIX Association, 2012), p. 2
Google Scholar
M. Tahmassebpour, A new method for time-series big data effective storage. IEEE Access 5, 10694–10699 (2017)
Article Google Scholar
Cloudera Blog on Spark Timeseries Library. https://blog.cloudera.com/blog/2015/12/spark-ts-a-new-library-for-analyzing-time-series-data-with-apache-spark/
Time series for Spark—A library for analyzes of large scale time series datasets. https://sryza.github.io/spark-timeseries/0.3.0/index.html
Huohua—Framework for Distributed Time Series. https://databricks.com/session/huohua-a-distributed-time-series-analysis-framework-for-spark
Flint—Library for time series. https://www.twosigma.com/insights/article/introducing-flint-a-time-series-library-for-apache-spark/.
Flint—Library for time series. https://ts-flint.readthedocs.io/en/latest/
K. Shvachko, H. Kuang, S. Radia, R. Chansler, The hadoop distributed file system, in 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). (IEEE, 2010), pp. 1–10
Google Scholar
H. Tang, A. Gulbeden, J. Zhou, W. Strathearn, T. Yang, L. Chu, A self-organizing storage cluster for parallel data-intensive applications, in SC ‘04: Proceedings of the 2004 ACM/IEEE Conference on Supercomputing, Pittsburgh, PA, USA, 2004, pp. 52–52. https://doi.org/10.1109/SC.2004.9
Climate data retrieved from https://www.timeanddate.com/weather/india/mumbai
Databricks Platform for execution. https://databricks.com/

Download references

Author information

Authors and Affiliations

V.E.S. Institute of Technology, Mumbai, India
Asha Bharambe
Sadar Patel Institute of Technology, Mumbai, India
Dhananjay Kalbande

Authors

Asha Bharambe
View author publications
You can also search for this author in PubMed Google Scholar
Dhananjay Kalbande
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Engineering, Tribhuvan University, Kirtipur, Nepal
Subarna Shakya
Czech Technical University in Prague, Praha, Czech Republic
Robert Bestak
Gerald Schwartz School of Business, St. Francis Xavier University, Antigonish, NS, Canada
Ram Palanisamy
Department of Computer Science, Texas Southern University, Houston, TX, USA
Khaled A. Kamel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bharambe, A., Kalbande, D. (2022). Self-organizing Data Processing for Time Series Using SPARK. In: Shakya, S., Bestak, R., Palanisamy, R., Kamel, K.A. (eds) Mobile Computing and Sustainable Informatics. Lecture Notes on Data Engineering and Communications Technologies, vol 68. Springer, Singapore. https://doi.org/10.1007/978-981-16-1866-6_17

Download citation

DOI: https://doi.org/10.1007/978-981-16-1866-6_17
Published: 23 July 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-1865-9
Online ISBN: 978-981-16-1866-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Self-organizing Data Processing for Time Series Using SPARK

Abstract

Access this chapter

Similar content being viewed by others

Cluster-Based Forecasting for Intermittent and Non-intermittent Time Series

A Comparison of Multivariate Time Series Clustering Methods

REFII Model as a Base for Data Mining Techniques Hybridization with Purpose of Time Series Pattern Recognition

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Self-organizing Data Processing for Time Series Using SPARK

Abstract

Access this chapter

Similar content being viewed by others

Cluster-Based Forecasting for Intermittent and Non-intermittent Time Series

A Comparison of Multivariate Time Series Clustering Methods

REFII Model as a Base for Data Mining Techniques Hybridization with Purpose of Time Series Pattern Recognition

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation