Skip to main content

A Data-Centric Approach for Reducing Carbon Emissions in Deep Learning

  • Conference paper
  • First Online:
Advanced Information Systems Engineering (CAiSE 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13901))

Included in the following conference series:

  • 925 Accesses

Abstract

The growing popularity of Deep Learning (DL) in recent years has had a large environmental impact. Training models require a lot of processing and computation and therefore require a lot of energy. The size of these models and the amount of data required for training them have grown exponentially, not comparable to the performance improvements. Recently, some model-centric approaches have been proposed to limit the environmental impact of AI. This paper complements them by proposing a data-centric “Green AI” approach, focusing on the data preparation phase of the DL pipeline. A general methodology, valid for any DL task, is proposed. This methodology is based on analyzing data characteristics, mainly the data quality and volume dimensions, and observing how these affect carbon emissions and performance on different models. With this information, a human-in-the-loop (HITL) approach is provided to support researchers in obtaining a modified and reduced version of a dataset that can decrease the environmental impact of training while achieving a specified performance goal. To demonstrate its validity, the proposed methodology is applied to the time series classification task and a prototype has been developed which demonstrates the possibility of reducing the carbon emissions of DL training by up to 50%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.timeseriesclassification.com/dataset.php.

  2. 2.

    https://colab.research.google.com.

  3. 3.

    https://codecarbon.io/.

  4. 4.

    https://github.com/mfanselmo/Time-Series-Classification-GreenAI.

  5. 5.

    http://www.timeseriesclassification.com/description.php?Dataset=SwedishLeaf.

References

  1. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(2), 281–305 (2012)

    MathSciNet  MATH  Google Scholar 

  2. Berti-Equille, L.: Learn2Clean: optimizing the sequence of tasks for web data preparation. In: The World Wide Web Conference, pp. 2580–2586 (2019)

    Google Scholar 

  3. Budach, L., et al.: The effects of data quality on machine learning performance. preprint arXiv:2207.14529 (2022)

  4. Castanyer, R.C., Martínez-Fernández, S., Franch, X.: Which design decisions in AI-enabled mobile applications contribute to greener AI? preprint arXiv:2109.15284 (2021)

  5. Dong, X.L., Srivastava, D.: Big data integration. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 1245–1248. IEEE (2013)

    Google Scholar 

  6. Frey, N.C., et al.: Energy-aware neural architecture selection and hyperparameter optimization. In: 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 732–741. IEEE (2022)

    Google Scholar 

  7. He, K., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  8. Hsiao, T.Y., et al.: Filter-based deep-compression with global average pooling for convolutional networks. J. Syst. Archit. 95, 9–18 (2019)

    Article  Google Scholar 

  9. Jain, A., et al.: Overview and importance of data quality for machine learning tasks. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 3561–3562 (2020)

    Google Scholar 

  10. Knight, W.: AI can do great things - if it doesn’t burn the planet. Wired Magazine (2020)

    Google Scholar 

  11. Konstantinou, N., Paton, N.W.: Feedback driven improvement of data preparation pipelines. Inf. Syst. 92, 101480 (2020)

    Article  Google Scholar 

  12. Lucivero, F.: Big data, big waste? A reflection on the environmental sustainability of big data initiatives. Sci. Eng. Ethics 26(2), 1009–1030 (2020). https://doi.org/10.1007/s11948-019-00171-7

    Article  Google Scholar 

  13. Maccioni, A., Torlone, R.: KAYAK: a framework for just-in-time data preparation in a data lake. In: Krogstie, J., Reijers, H.A. (eds.) CAiSE 2018. LNCS, vol. 10816, pp. 474–489. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91563-0_29

    Chapter  Google Scholar 

  14. Miao, Z., et al.: A data preparation framework for cleaning electronic health records and assessing cleaning outcomes for secondary analysis. Inf. Syst. 111, 102130 (2023)

    Article  Google Scholar 

  15. Patterson, D., et al.: Carbon emissions and large neural network training. preprint arXiv:2104.10350 (2021)

  16. Rolnick, D., et al.: Tackling climate change with machine learning. ACM Comput. Surv. (CSUR) 55(2), 1–96 (2022)

    Article  Google Scholar 

  17. Schwartz, R., et al.: Green AI. Commun. ACM 63(12), 54–63 (2020)

    Article  Google Scholar 

  18. Segal, M.R.: Machine learning benchmarks and random forest regression. UCSF: Center for Bioinformatics and Molecular Biostatistics (2004)

    Google Scholar 

  19. Shin, Y., et al.: Practical methods of image data preprocessing for enhancing the performance of deep learning based road crack detection. ICIC Express Lett. Part B Appl. 11(4), 373–379 (2020)

    Google Scholar 

  20. Strubell, E., Ganesh, A., McCallum, A.: Energy and policy considerations for deep learning in NLP. preprint arXiv:1906.02243, June 2019

  21. Sun, C., et al.: Revisiting unreasonable effectiveness of data in deep learning era. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 843–852 (2017)

    Google Scholar 

  22. Werner de Vargas, V., et al.: Imbalanced data preprocessing techniques for machine learning: a systematic mapping study. Knowl. Inf. Syst. 65(1), 31–57 (2023). https://doi.org/10.1007/s10115-022-01772-8

  23. Verdecchia, R., et al.: Data-centric green AI: an exploratory empirical study. preprint arXiv:2204.02766 (2022)

  24. Wang, Z., Yan, W., Oates, T.: Time series classification from scratch with deep neural networks: a strong baseline. In: 2017 International Joint Conference on Neural Networks (IJCNN), pp. 1578–1585. IEEE (2017)

    Google Scholar 

  25. Xu, J., et al.: A survey on green deep learning. preprint arXiv:2111.05193 (2021)

Download references

Acknowledgements

This research was supported by the EU Horizon Framework grant agreement 101070186 (TEADAL) and by the Spoke 1 “FutureHPC & BigData” of the Italian Research Center on High-Performance Computing, Big Data and Quantum Computing (ICSC) funded by MUR Missione 4 - Next Generation EU (NGEU).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Monica Vitali .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Anselmo, M., Vitali, M. (2023). A Data-Centric Approach for Reducing Carbon Emissions in Deep Learning. In: Indulska, M., Reinhartz-Berger, I., Cetina, C., Pastor, O. (eds) Advanced Information Systems Engineering. CAiSE 2023. Lecture Notes in Computer Science, vol 13901. Springer, Cham. https://doi.org/10.1007/978-3-031-34560-9_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-34560-9_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-34559-3

  • Online ISBN: 978-3-031-34560-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics