A Nearest Neighbours-Based Algorithm for Big Time Series Data Forecasting

Talavera-Llames, Ricardo L.; Pérez-Chacón, Rubén; Martínez-Ballesteros, María; Troncoso, Alicia; Martínez-Álvarez, Francisco

doi:10.1007/978-3-319-32034-2_15

Ricardo L. Talavera-Llames¹⁷,
Rubén Pérez-Chacón¹⁷,
María Martínez-Ballesteros¹⁸,
Alicia Troncoso¹⁷ &
…
Francisco Martínez-Álvarez¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9648))

Included in the following conference series:

International Conference on Hybrid Artificial Intelligence Systems

2291 Accesses
16 Citations

Abstract

A forecasting algorithm for big data time series is presented in this work. A nearest neighbours-based strategy is adopted as the main core of the algorithm. A detailed explanation on how to adapt and implement the algorithm to handle big data is provided. Although some parts remain iterative, and consequently requires an enhanced implementation, execution times are considered as satisfactory. The performance of the proposed approach has been tested on real-world data related to electricity consumption from a public Spanish university, by using a Spark cluster.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Box, G., Jenkins, G.: Time Series Analysis: Forecasting and Control. John Wiley and Sons, Hoboken (2008)
Book MATH Google Scholar
Canuto, S., Gonçalves, M., Santos, W., Rosa, T., Martins, W.: An efficient and scalable metafeature-based document classification approach based on massively parallel computing. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 333–342 (2015)
Google Scholar
Cover, T.M., Hart, P.E.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theor. 13(1), 21–27 (1967)
Article MATH Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Machine Learning Library (MLlib) for Spark (2015). http://spark.apache.org/docs/latest/mllib-guide.html
Hamstra, M., Karau, H., Zaharia, M., Knwinski, A., Wendell, P.: Learning Spark: Lightning-Fast Big Analytics. O’ Really Media, Sebastopol (2015)
Google Scholar
Martínez-Álvarez, F., Troncoso, A., Riquelme, J.C., Aguilar, J.S.: Discovery of motifs to forecast outlier occurrence in time series. Pattern Recogn. Lett. 32, 1652–1665 (2011)
Article Google Scholar
Martínez-Álvarez, F., Troncoso, A., Riquelme, J.C., Aguilar, J.S.: Energy time series forecasting based on pattern sequence similarity. IEEE Trans. Knowl. Data Eng. 23, 1230–1243 (2011)
Article Google Scholar
Martínez-Álvarez, F., Troncoso, A., Asencio-Cortés, G., Riquelme, J.: A survey on data mining techniques applied to electricity-related time series forecasting. Energies 8(11), 12361 (2015)
Google Scholar
Minelli, M., Chambers, M., Dhiraj, A.: Big Data, Big Analytics: Emerging Business Intelligence and Analytics Trends for Today’s Businesses. John Wiley and Sons, Hoboken (2013)
Book Google Scholar
Muja, M., Lowe, D.G.: Scalable nearest neighbor algorithms for high dimensional data. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2227–2240 (2014)
Article Google Scholar
Reyes-Ortiz, J.L., Oneto, L., Anguita, D.: Big data analytics in the cloud: spark on hadoop vs MPI/OpenMP on beowulf. Procedia Comput. Sci. 53, 121–130 (2015)
Article Google Scholar
Triguero, I., Peralta, D., Bacardit, J., García, S., Herrera, F.: MRPR: a mapreduce solution for prototype reduction in big data classification. Neurocomputing 150, 331–345 (2015)
Article Google Scholar
Troncoso, A., Riquelme, J.C., Riquelme, J.M., Martínez, J.L., Gómez, A.: Electricity market price forecasting based on weighted nearest neighbours techniques. IEEE Trans. Power Syst. 22(3), 1294–1301 (2007)
Article MATH Google Scholar
White, T.: Hadoop, The Definitive Guide. O’ Really Media, Sebastopol (2012)
Google Scholar
Yang, M., Zheng, L., Lu, Y., Guo, M., Li, J.: Cloud-assisted spatio-textual k nearest neighbor joins in sensor networks. In: Proceedings of the Industrial Networks and Intelligent Systems, pp. 12–17 (2015)
Google Scholar
Zhang, C., Li, F., Jestes, J.: Efficient parallel kNN joins for large data in mapreduce. In: Proceedings of the International Conference on Extending Database Technology, pp. 38–49 (2012)
Google Scholar

Download references

Acknowledgments

The authors would like to thank the Spanish Ministry of Economy and Competitiveness, Junta de Andalucía, Fundación Pública Andaluza Centro de Estudios Andaluces and Universidad Pablo de Olavide for the support under projects TIN2014-55894-C2-R, P12-TIC-1728, PRY153/14 and APPB813097, respectively.

Author information

Authors and Affiliations

Division of Computer Science, Universidad Pablo de Olavide, 41013, Seville, Spain
Ricardo L. Talavera-Llames, Rubén Pérez-Chacón, Alicia Troncoso & Francisco Martínez-Álvarez
Department of Computer Science, University of Seville, Seville, Spain
María Martínez-Ballesteros

Authors

Ricardo L. Talavera-Llames
View author publications
You can also search for this author in PubMed Google Scholar
Rubén Pérez-Chacón
View author publications
You can also search for this author in PubMed Google Scholar
María Martínez-Ballesteros
View author publications
You can also search for this author in PubMed Google Scholar
Alicia Troncoso
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Martínez-Álvarez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Francisco Martínez-Álvarez .

Editor information

Editors and Affiliations

Universidad Pablo de Olavide, Sevilla, Spain
Francisco Martínez-Álvarez
Universidad Pablo de Olavide, Sevilla, Spain
Alicia Troncoso
University of Salamanca, Salamanca, Spain
Héctor Quintián
University of Salamanca, Salamanca, Spain
Emilio Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Talavera-Llames, R.L., Pérez-Chacón, R., Martínez-Ballesteros, M., Troncoso, A., Martínez-Álvarez, F. (2016). A Nearest Neighbours-Based Algorithm for Big Time Series Data Forecasting. In: Martínez-Álvarez, F., Troncoso, A., Quintián, H., Corchado, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2016. Lecture Notes in Computer Science(), vol 9648. Springer, Cham. https://doi.org/10.1007/978-3-319-32034-2_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-32034-2_15
Published: 14 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-32033-5
Online ISBN: 978-3-319-32034-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics