A Data-Centric Approach for Reducing Carbon Emissions in Deep Learning

Anselmo, Martín; Vitali, Monica

doi:10.1007/978-3-031-34560-9_8

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13901))

Included in the following conference series:

International Conference on Advanced Information Systems Engineering

925 Accesses

Abstract

The growing popularity of Deep Learning (DL) in recent years has had a large environmental impact. Training models require a lot of processing and computation and therefore require a lot of energy. The size of these models and the amount of data required for training them have grown exponentially, not comparable to the performance improvements. Recently, some model-centric approaches have been proposed to limit the environmental impact of AI. This paper complements them by proposing a data-centric “Green AI” approach, focusing on the data preparation phase of the DL pipeline. A general methodology, valid for any DL task, is proposed. This methodology is based on analyzing data characteristics, mainly the data quality and volume dimensions, and observing how these affect carbon emissions and performance on different models. With this information, a human-in-the-loop (HITL) approach is provided to support researchers in obtaining a modified and reduced version of a dataset that can decrease the environmental impact of training while achieving a specified performance goal. To demonstrate its validity, the proposed methodology is applied to the time series classification task and a prototype has been developed which demonstrates the possibility of reducing the carbon emissions of DL training by up to 50%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(2), 281–305 (2012)
MathSciNet MATH Google Scholar
Berti-Equille, L.: Learn2Clean: optimizing the sequence of tasks for web data preparation. In: The World Wide Web Conference, pp. 2580–2586 (2019)
Google Scholar
Budach, L., et al.: The effects of data quality on machine learning performance. preprint arXiv:2207.14529 (2022)
Castanyer, R.C., Martínez-Fernández, S., Franch, X.: Which design decisions in AI-enabled mobile applications contribute to greener AI? preprint arXiv:2109.15284 (2021)
Dong, X.L., Srivastava, D.: Big data integration. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 1245–1248. IEEE (2013)
Google Scholar
Frey, N.C., et al.: Energy-aware neural architecture selection and hyperparameter optimization. In: 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 732–741. IEEE (2022)
Google Scholar
He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hsiao, T.Y., et al.: Filter-based deep-compression with global average pooling for convolutional networks. J. Syst. Archit. 95, 9–18 (2019)
Article Google Scholar
Jain, A., et al.: Overview and importance of data quality for machine learning tasks. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 3561–3562 (2020)
Google Scholar
Knight, W.: AI can do great things - if it doesn’t burn the planet. Wired Magazine (2020)
Google Scholar
Konstantinou, N., Paton, N.W.: Feedback driven improvement of data preparation pipelines. Inf. Syst. 92, 101480 (2020)
Article Google Scholar
Lucivero, F.: Big data, big waste? A reflection on the environmental sustainability of big data initiatives. Sci. Eng. Ethics 26(2), 1009–1030 (2020). https://doi.org/10.1007/s11948-019-00171-7
Article Google Scholar
Maccioni, A., Torlone, R.: KAYAK: a framework for just-in-time data preparation in a data lake. In: Krogstie, J., Reijers, H.A. (eds.) CAiSE 2018. LNCS, vol. 10816, pp. 474–489. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91563-0_29
Chapter Google Scholar
Miao, Z., et al.: A data preparation framework for cleaning electronic health records and assessing cleaning outcomes for secondary analysis. Inf. Syst. 111, 102130 (2023)
Article Google Scholar
Patterson, D., et al.: Carbon emissions and large neural network training. preprint arXiv:2104.10350 (2021)
Rolnick, D., et al.: Tackling climate change with machine learning. ACM Comput. Surv. (CSUR) 55(2), 1–96 (2022)
Article Google Scholar
Schwartz, R., et al.: Green AI. Commun. ACM 63(12), 54–63 (2020)
Article Google Scholar
Segal, M.R.: Machine learning benchmarks and random forest regression. UCSF: Center for Bioinformatics and Molecular Biostatistics (2004)
Google Scholar
Shin, Y., et al.: Practical methods of image data preprocessing for enhancing the performance of deep learning based road crack detection. ICIC Express Lett. Part B Appl. 11(4), 373–379 (2020)
Google Scholar
Strubell, E., Ganesh, A., McCallum, A.: Energy and policy considerations for deep learning in NLP. preprint arXiv:1906.02243, June 2019
Sun, C., et al.: Revisiting unreasonable effectiveness of data in deep learning era. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 843–852 (2017)
Google Scholar
Werner de Vargas, V., et al.: Imbalanced data preprocessing techniques for machine learning: a systematic mapping study. Knowl. Inf. Syst. 65(1), 31–57 (2023). https://doi.org/10.1007/s10115-022-01772-8
Verdecchia, R., et al.: Data-centric green AI: an exploratory empirical study. preprint arXiv:2204.02766 (2022)
Wang, Z., Yan, W., Oates, T.: Time series classification from scratch with deep neural networks: a strong baseline. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 1578–1585. IEEE (2017)
Google Scholar
Xu, J., et al.: A survey on green deep learning. preprint arXiv:2111.05193 (2021)

Download references

Acknowledgements

This research was supported by the EU Horizon Framework grant agreement 101070186 (TEADAL) and by the Spoke 1 “FutureHPC & BigData” of the Italian Research Center on High-Performance Computing, Big Data and Quantum Computing (ICSC) funded by MUR Missione 4 - Next Generation EU (NGEU).

Author information

Authors and Affiliations

Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Milan, Italy
Martín Anselmo & Monica Vitali

Authors

Martín Anselmo
View author publications
You can also search for this author in PubMed Google Scholar
Monica Vitali
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Monica Vitali .

Editor information

Editors and Affiliations

The University of Queensland, Brisbane, QLD, Australia
Marta Indulska
University of Haifa, Haifa, Israel
Iris Reinhartz-Berger
Universidad San Jorge, Zaragoza, Spain
Carlos Cetina
Universitat Politècnica de València, Valencia, Spain
Oscar Pastor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Anselmo, M., Vitali, M. (2023). A Data-Centric Approach for Reducing Carbon Emissions in Deep Learning. In: Indulska, M., Reinhartz-Berger, I., Cetina, C., Pastor, O. (eds) Advanced Information Systems Engineering. CAiSE 2023. Lecture Notes in Computer Science, vol 13901. Springer, Cham. https://doi.org/10.1007/978-3-031-34560-9_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-34560-9_8
Published: 08 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-34559-3
Online ISBN: 978-3-031-34560-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Data-Centric Approach for Reducing Carbon Emissions in Deep Learning