Abstract
In recent years the available volume of information has grown considerably due to the development of new technologies such as the sensor networks or smart meters, and therefore, new algorithms able to deal with big data are necessary. In this work the distributed version of the k-means algorithm in the Apache Spark framework is proposed in order to find patterns from a big time series. Results corresponding to the electricity consumptions for years 2011, 2012 and 2013 for two buildings from a public university are presented and discussed. Finally, the performance of the proposed methodology in relation to the computational time is compared with that of Weka as benchmarking.
Keywords
- Big data
- Time series
- Patterns
- Clustering
This is a preview of subscription content, access via your institution.
Buying options
Preview
Unable to display preview. Download preview PDF.
References
Bahmani, B., Moseley, A., Vattani, R., Kumar, R., Vassilvitskii, S.: Scalable k-means++. Proceedings of the VLDB Endowment 5(7), 622–633 (2012)
Capó, M., Pérez, A., Lozano, J.A.: A recursive k-means initialization algorithm for massive data. In: Proceedings of the Spanish Association for Artificial Intelligence (2015)
Dean, J., Ghemawat, S.: Mapreduce: Simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)
Ding, R., Wang, Q., Dan, Y., Fu, Q., Zhang, H., Zhang, D.: Yading: fast clustering of large-scale time series data. Proceedings of the VLDB Endowment 8(5), 473–484 (2015)
Fahad, A., Alshatri, N., Tari, Z., Alamri, A., Zomaya, A.Y., Khalil, I., Sebti, F., Bouras, A.: A survey of clustering algorithms for big data: Taxonomy & empirical analysis. IEEE Transactions on Emerging Topics in Computing 5, 267–279 (2014)
Fernández, A., del Río, S., López, V., Bawakid, A., del Jesús, M.J., Benítez, J.M., Herrera, F.: Big data with cloud computing: an insight on the computing environment, mapreduce, and programming frameworks. WIREs Data Mining and Knowledge Discovery 4(5), 380–409 (2014)
Machine Learning Library (MLlib) for Spark (2015). http://spark.apache.org/docs/latest/mllib-guide.html
Hamstra, M., Karau, H., Zaharia, M., Knwinski, A., Wendell, P.: Learning Spark: Lightning-Fast Big Analytics. O’ Really Media (2015)
Hernández, L., Baladrón, C., Aguiar, J.M., Carro, B., Sánchez-Esguevillas, A.: Classification and clustering of electricity demand patterns in industrial parks. Energies 5, 5215–5228 (2012)
Keyno, H.R.S., Ghaderi, F., Azade, A., Razmi, J.: Forecasting electricity consumption by clustering data in order to decline the periodic variable’s affects and simplification the pattern. Energy Conversion and Management 50(3), 829–836 (2009)
Martínez-Álvarez, F., Troncoso, A., Riquelme, J.C., Riquelme, J.M.: Discovering patterns in electricity price using clustering techniques. In: Proceedings of the International Conference on Renewable Energy and Power Quality, pp. 245–252 (2007)
Martínez-Álvarez, F., Troncoso, A., Riquelme, J.C., Riquelme, J.M.: Partitioning-clustering techniques applied to the electricity price time series. In: LNCS, vol. 4881, pp. 990–991 (2007)
Minelli, M., Chambers, M., Dhiraj, A.: Big Data, Big Analytics: emerging business intelligence and analytics trends for today’s businesses. John Wiley and Sons (2013)
Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., Keogh, E.: Addressing big data time series: Mining trillions of time series subsequences under dynamic time warping. ACM Transactions on Knowledge Discovery from Data 7(3), 267–279 (2014)
Van Wijk, J.J., Van Selow, E.R.: Cluster and calendar based visualization of time series data. In: Proceedings of the International IEEE Symposium on Information Visualization (1999)
Zhao, W., Ma, H., He, Q.: Parallel k-means clustering based on mapreduce. In: LNCS, vol. 5391, pp. 674–679 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Perez-Chacon, R., Talavera-Llames, R.L., Martinez-Alvarez, F., Troncoso, A. (2016). Finding Electric Energy Consumption Patterns in Big Time Series Data. In: , et al. Distributed Computing and Artificial Intelligence, 13th International Conference. Advances in Intelligent Systems and Computing, vol 474. Springer, Cham. https://doi.org/10.1007/978-3-319-40162-1_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-40162-1_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-40161-4
Online ISBN: 978-3-319-40162-1
eBook Packages: EngineeringEngineering (R0)