Reinforcement learning applications in environmental sustainability: a review

Zuccotto, Maddalena; Castellini, Alberto; Torre, Davide La; Mola, Lapo; Farinelli, Alessandro

doi:10.1007/s10462-024-10706-5

Reinforcement learning applications in environmental sustainability: a review

Open access
Published: 12 March 2024

Volume 57, article number 88, (2024)
Cite this article

Download PDF

You have full access to this open access article

Artificial Intelligence Review Aims and scope Submit manuscript

Reinforcement learning applications in environmental sustainability: a review

Download PDF

Maddalena Zuccotto¹,
Alberto Castellini¹,
Davide La Torre²,
Lapo Mola^3,4 &
…
Alessandro Farinelli¹

2869 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Environmental sustainability is a worldwide key challenge attracting increasing attention due to climate change, pollution, and biodiversity decline. Reinforcement learning, initially employed in gaming contexts, has been recently applied to real-world domains, including the environmental sustainability realm, where uncertainty challenges strategy learning and adaptation. In this work, we survey the literature to identify the main applications of reinforcement learning in environmental sustainability and the predominant methods employed to address these challenges. We analyzed 181 papers and answered seven research questions, e.g., “How many academic studies have been published from 2003 to 2023 about RL for environmental sustainability?” and “What were the application domains and the methodologies used?”. Our analysis reveals an exponential growth in this field over the past two decades, with a rate of 0.42 in the number of publications (from 2 papers in 2007 to 53 in 2022), a strong interest in sustainability issues related to energy fields, and a preference for single-agent RL approaches to deal with sustainability. Finally, this work provides practitioners with a clear overview of the main challenges and open problems that should be tackled in future research.

Exploring the Landscapes and Emerging Trends of Reinforcement Learning from 1990 to 2020: A Bibliometric Analysis

Optimizing Urban Design for Pandemics Using Reinforcement Learning and Multi-objective Optimization

Multi-objective reinforcement learning for designing ethical multi-agent environments

Article Open access 23 August 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Artificial Intelligence (AI) is taking an increasingly important role in industry and society. AI techniques have been recently introduced in autonomous driving, personalized shopping, and fraud prevention, just to make a few examples. A key challenge faced by today’s society for which AI can bring an important advancement is environmental sustainability. Climate change, pollution, biodiversity decline, poor health, and poverty have led in the last years governments and companies to focus more and more their efforts and investments on solutions to environmental sustainability problems, which are usually characterized by an inefficient and increased use of resources. Environmental sustainability can be defined as a set of constraints regarding the use of renewable and nonrenewable resources on the one hand, pollution, and waste assimilation on the other (Goodland 1995). In this regard, in 2015, the United Nations published the “2030 Agenda for Sustainable Development” the centerpiece of which is 17 Sustainable Development Goals (United Nations 2015) to be fully achieved by 2030 to attain sustainable development in the economic, social, and environmental contexts, and eliminate all forms of poverty.

AI-based algorithms can control autonomous drones used in water monitoring (Steccanella et al. 2020; Marchesini et al. 2021; Bianchi et al. 2023), extract from acquired data new insight about environmental conditions (Castellini et al. 2020; Azzalini et al. 2020), improve the healthiness of indoor environments (Capuzzo et al. 2022), or demand forecast in district heating networks (Bianchi et al. 2019; Castellini et al. 2021, 2022). Several AI techniques have been employed to address various environmental sustainability challenges. These approaches enable the efficient management of distributed resources within smart grids (Roncalli et al. 2019; Orfanoudakis and Chalkiadakis 2023), improve the power flow for DC grids (Blij et al. 2020), increase the utilization of renewable resources for electric vehicle charging (Koufakis et al. 2020), and mitigate carbon emissions in urban transportation by fostering ridesharing and reducing traffic congestion (Bistaffa et al. 2021, 2017). Furthermore, a crucial aspect of climate change prevention involves optimizing the energy consumption associated with heating and cooling residential properties. To tackle this issue, AI-based approaches have been developed methods to enhance the efficiency of home systems (Panagopoulos et al. 2015; Auffenberg et al. 2017) and quantify the thermal efficiency of residences (Brown et al. 2021). Among the broad spectrum of AI techniques in this survey, we focus on Reinforcement Learning (RL) (Sutton and Barto 2018), which has recently obtained impressive success, achieving human-level performance in several tasks, such as in the context of games (Silver et al. 2016, 2017).

One of the most important and interesting challenges in today’s RL research is the application of RL algorithms to real-world domains, where uncertainty makes strategy learning and adaptation much more complex than in game environments. In particular, the application of RL to environmental sustainability has achieved, in the last decade, a strong interest from both the computer science community and the communities of environmental sciences and business. Reducing carbon emissions requires increasing renewable resources usage, such as solar and wind power. While these resources are economically efficient, their stochastic and intermittent nature poses challenges in replacing nonrenewable energy sources within energy networks. RL, with a systematic trial-and-error interaction with dynamic environments, offers a promising approach for learning optimal policies that can adapt to changing system dynamics and effectively manage environmental uncertainty. Thus, an RL agent is capable of handling variations in operating conditions, for instance, due to a change in resource availability or weather conditions.

This work surveys the recent use of RL to improve environmental sustainability. It provides a comprehensive overview of the different application domains where RL has been used, such as energy and water resource management, and traffic management. The goal is to show practitioners the state-of-the-art RL methods that are currently used to solve environmental sustainability problems in each of these domains. For each paper analyzed, we consider

The problem tackled,
The RL approach used,
The challenges faced,
The formalization of the RL problem (i.e., type of state/action space, type of transition model, type of RL method, performance measures used to evaluate the results).

The paper is structured as follows. Section 2 presents the surveys already available on topics close to RL and environmental sustainability. Section 3 presents the basic concepts of RL as well as a formalization of the main concepts. In Sect. 4, we present the research methodology used in our survey. Section 5 describes the results of our research, considering different levels of detail. In particular, in Sects. 5.1.1, 5.1.2 and 5.1.3, we provide a quantitative analysis of the state-of-the-art related to the application of RL in environmental sustainability over the last two decades. Then, Sect. 5.1.4 outlines domains where RL techniques are applied and the RL-based approaches employed to address environmental sustainability. In Sect. 5.2, our focus shifts to a subset of 35 main papers, for which we analyze the application domains of proposed RL techniques, provide technical insights into problem formalization, discuss the performance metrics used for evaluation, and consider the challenges addressed. Section 5.3 provides an in-depth analysis of each of these main papers. Finally, in Sect. 6 we discuss our findings, and in Sect. 7 we draw conclusions and summarize future directions.

2 Related work

The literature provides already some surveys on the application of RL to problems related to environmental sustainability, but all these works focus only on specific aspects of environmental sustainability or they consider also AI methods different from RL. For instance, Ma et al. (2020) focus on Energy-Harvesting Internet of Things (IoT) devices, offering insights into recent advancements addressing challenges in commercialization, standards development, context sensing, intermittent computing, and communication strategies. Charef et al. (2023) conduct a study considering various AI techniques, including RL, to enhance energy sustainability within IoT networks. They categorize studies based on the challenges they address, establishing connections between challenges and AI-based solutions while delineating the performance metrics used for evaluation. Within the domain of Architecture, Engineering, Construction, and Operation, Rampini and Re Cecconi (2022) concentrate on the application of AI techniques, including RL, in Asset Management. Their work reviews studies related to several aspects such as energy management, condition assessment, operations, risk, and project management, identifying key points for future development in this context. Alanne and Sierla (2022) shift their focus to smart buildings, discussing the learning capabilities of intelligent buildings and categorizing learning application domains based on objectives. They also survey the application of RL and Deep Reinforcement Learning (DRL) in decision-making and energy management, encompassing aspects like control of heating and cooling systems and lighting systems. Within the context of smart buildings and smart grids, Mabina et al. (2021) examine the utilization of Machine Learning (ML), including RL, for optimizing energy consumption and electric water heater scheduling, emphasizing the advantages of these approaches in Demand Response (DR) due to their interaction with the environment. Himeur et al. (2022) investigate the integration of AI-big data analytics into various tasks such as load forecasting, water management, and indoor environmental quality monitoring, focusing on the role of RL and DRL in optimizing occupant comfort and energy consumption. Yang et al. (2020) focus on the application of RL and DRL techniques to sustainable energy and electric systems, addressing issues such as optimization, control, energy markets, cyber security, and electric vehicle management.

In the realm of transportation systems, Li et al. (2023) explore various topics, including cooperative mobility-on-demand systems, driver assistance systems, autonomous vehicles (AVs), and electric vehicles (EVs). Sabet and Farooq (2022) study the state-of-the-art in the context of Green Vehicle Routing Problems, which involve reducing greenhouse gas (GHG) emissions and addressing issues like charging activities, pickup and delivery operations, and energy consumption. Moreover, the authors note that most of the works leverage metaheuristics while using RL methods is uncommon. Chen et al. (2019) tackle sustainability concerns within the Internet of Vehicles, leveraging 5th generation mobile network (5G) technology, Mobile Edge Computing architecture, and DRL to optimize energy consumption and resource utilization. Rangel-Martinez et al. (2021) assess the application of ML techniques, including RL, in manufacturing, with a focus on energy-related fields impacting environmental sustainability. Sivamayil et al. (2023) explore a wide range of RL applications (e.g., Natural Language Processing, health care, etc.) emphasizing Energy Management Systems with an environmental sustainability perspective. Mischos et al. (2023) investigate Intelligent Energy Management Systems across diverse building environments, considering control types and optimization approaches, including ML, DL, and DRL. Yao et al. (2023) discuss the application of Agent-Based Modeling and Multi-Agent System modeling in the transition to Multi-Energy Systems, highlighting RL and suggesting future research directions in Multi-Agent Reinforcement Learning (MARL) for energy systems.

While these works address specific aspects of environmental sustainability using RL methods, our review takes a comprehensive approach, analyzing all contexts in which RL techniques have recently contributed to enhancing environmental sustainability. Our goal is to provide practitioners with insights into state-of-the-art methods for addressing environmental sustainability challenges across various application domains, including energy and water resource management and traffic management. In summary, the main contribution of this survey consists of offering an overview of RL application domains within the context of environmental sustainability.

3 Reinforcement learning: preliminaries and main definitions

In this section, we present the basic concepts of RL as well as a formalization of the main concepts. RL, a prominent machine learning paradigm, focuses on learning a policy that maximizes cumulative rewards, i.e., which action should be selected considering the environment configuration for achieving the best possible outcome. Key elements of RL are listed in the following:

The agent is the entity that makes decisions and performs actions in the environment;
The environment represents the system with which the agent interacts and provides the agent with feedback on the performed action;
The policy is a function that defines the agent’s behavior considering the environment configuration (i.e., a map between what the agent observes and what the agent should do);
The reward is a numerical signal that provides feedback on the action performed by the agent;
The value function specifies state values, namely, how valuable it is to reach a state, considering also future states reachable from it;
The model of the environment (optional) is a stochastic function providing next state probability given current state and action, it allows simulating the behavior of the environment in response to the agent’s actions.

RL methods (Sutton and Barto 2018) can be categorized into two main groups: model-free and model-based (Moerland et al. 2020). Over the past two decades, model-free methods have demonstrated significant success. Meanwhile, model-based approaches have become a focal point in current research due to their potential to enhance sample efficiency, which is a reduction in interactions with the environment. This efficiency is achieved by explicitly representing the model of the environment and incorporating relevant prior knowledge (Castellini et al. 2019; Zuccotto et al. 2022a, b). Additionally, model-based methods offer the advantage of addressing the risks associated with taking actions in partially observable environments (Mazzi et al. 2021, 2023; Simão et al. 2023) or partially known environment (Castellini et al. 2023).

A common framework to formalize the RL problem is by using Markov Decision Process (MDP) (Puterman 1994). An MDP is a tuple \((S, A, T, R, \gamma )\) where S is a finite set of states, A is a finite set of actions, \(T:S \times A \rightarrow \Pi (S)\) is the transition model where \(\Pi (S)\) is the space of probability distribution over states, \(R: S \times A \rightarrow {\mathbb {R}}\) is the reward function and \(\gamma \in [0,1)\) is the discount factor. The agent’s goal is to maximize the expected discounted return \({\mathbb {E}}[\sum _{t=0}^{\infty } \gamma ^t R(s_t,a_t)]\) acting optimally, namely, choosing in each state \(s_t\), at time t, the action \(a_t\) with the highest expected reward. The solution of an MDP is an optimal policy, namely, a function that optimally maps states into actions. A policy is optimal if it maximizes the expected discounted return. The discount factor \(\gamma\) reduces the weight of long-term rewards guaranteeing convergence. In the case of partially observable environments, an extension of the MDP framework, namely POMDP (Kaelbling et al. 1998), can be used. A POMDP is a tuple \((S, A, O, T, \Omega , R, \gamma )\) where the elements shared with MDP are augmented by \(\Omega\), a finite set of observations, and \(O: S \times A \rightarrow \Pi (\Omega )\), the observation model. In contrast to MDPS, in POMDPs the agent is not able to directly observe the current state \(s_t\) but it maintains a probability distribution over states S, called belief, which updates at each time-step. The belief summarizes the agent’s previous experiences, i.e. the sequence of actions and observations that the agent took from an initial belief \(b_0\) to the belief b. The solution of a POMDP is an optimal policy, namely, a function that optimally maps belief states into actions. In the following, we will survey applications of RL to environmental sustainability, hence we will investigate how the elements described in this section (e.g., MDP modeling framework, RL algorithms, etc.) have been used so far to solve problems related to environmental sustainability.

4 Review methodology

In this section, we outline the research methodology we used for this study. It consists of 5 steps: (i) the definition of the research questions, (ii) paper collection process, (iii) the definition of inclusion and exclusion criteria, (iv) the identification of relevant studies based on inclusion and exclusion criteria, (v) data extraction and analysis.

Research questions. The first step involves defining the research questions we want to answer on the application of RL techniques for environmental sustainability. The goal of our questions is twofold: to offer a quantitative analysis of the state of the art related to the application of RL to environmental sustainability and to analyze the use of these techniques focusing on sustainability. Specifically, we aim to answer the following questions:

RQ1: How many academic studies have been published from 2003 to 2023 about RL for environmental sustainability?
RQ2: What were the most relevant publication channels used?
RQ3: In which country were located the most active research centers?
RQ4: What were the application domains and the methodologies used?
RQ5: How was the RL problem formalized (i.e., type of state/action space, type of transition model, and type of dataset used)?
RQ6: Which evaluation metrics were used to assess the performance?
RQ7: What were the challenges addressed?

The databases we use to collect papers are those of the search engines Scopus and Web of Science. To limit the scope of research to the application of RL approaches for environmental sustainability, we define the following search strings:

“reinforcement learning AND sustainable AND environment”;
“reinforcement learning AND environmental AND sustainability”;
“reinforcement learning AND environment AND sustainability”;
“reinforcement learning AND environmental AND sustainable”.

The search on the two databases led to a total of 375 papers, 236 collected from Scopus and 139 from Web of Science.

Selection criteria for the initial set of (181) papers. To refine the results of the search, we outline the following inclusion and exclusion criteria.

Inclusion criteria. To determine studies eligible for inclusion in this work, we consider the following criteria:

It is written in English;
It is clearly focused on RL for environmental sustainability;
In the case of duplicate articles, the most recent version is included.

Exclusion criteria. To further refine our search, we apply the following exclusion criteria: the study is an editorial, a conference review, or a book chapter.

Following these criteria, we found 181 papers (104 articles, 70 conference papers, and 7 reviews). We combine the information in the index keywords of these papers with their number of citations and the publication year. In particular, we compute the number of occurrences of each keyword to identify the application domains and methodologies most used in the literature. To this aim, we standardize the keywords to avoid spelling variations. Then, we combine these values with the number of citations and the publication year to identify the most recent and relevant studies. In cases where index keywords are missing, we use author keywords. For the only three papers that do not have author nor index keywords, we use the title as related keywords.

Selection criteria for the set of (35) main papers. To identify papers for the in-depth analysis, we applied the following criteria that consider the most important keyword occurrences (i.e., the most frequent keywords), the publication year, and the number of citations based on publication year.

Presence of at least one keyword with no less than 10 occurrences;
Publication year from 2013 to 2023;
Number of citations:
- Papers published in 2022–2021, at least 3 citations;
- Papers published in 2020–2019, at least 10 citations;
- Papers published in 2018–2013, at least 20 citations.

Following these criteria, we selected 35 studies that have been explored in-depth, and answers to the research questions defined above have been reported.

In the following sections, we first consider the initial 181 papers found using the search strings defined above and applying inclusion/exclusion criteria. In Sect. 5.1.1 we answer question RQ1 for those papers, in Sect. 5.1.2 we answer question RQ2, in Sect. 5.1.3 we answer question RQ3 and in Sect. 5.1.4 we answer question RQ4. Namely, we first analyze the number of papers that focus on RL for sustainability published in the last 20 years, then we identify the main international conferences, workshops, and journals used to disseminate research, subsequently, we find the research centers that are particularly active in this research/application topic, and finally, we analyze the application domains and RL methodologies used. From Sect. 5.2, we start focusing only on the main 35 papers identified using main papers selection criteria. In particular, we answer question RQ4 in Sect. 5.2.1, question RQ5 in Sect. 5.2.2, question RQ6 in Sect. 5.2.3, and question RQ7 in Sect. 5.2.4. Namely, for these main papers, we first analyze the application domains of RL techniques and the RL-based approaches used to tackle environmental sustainability; then we analyze the way in which the problem has been formalized; subsequently, we investigate the evaluation measures used; finally, we identify the main challenges addresses. Notice that questions RQ1, RQ2, and RQ3 have not been answered considering only the main 35 papers because these questions aim to provide a quantitative analysis of the state of the art as a whole, and this subset of articles is part of the 181 papers used to answer these three questions.

5 Results of the review

This section reports the results of the analysis provided in this survey, first for the initial set of 181 papers, then for the subset of the main 35 papers.

5.1 Analysis of the initial set of 181 papers

The initial set of papers, selected using the search strings of Sect. 4, is analyzed by answering questions RQ1, RQ2, RQ3, and RQ4.

5.1.1 RQ1: How many academic studies have been published from 2003 to 2023 about RL for environmental sustainability?

This research question aims to quantify the interest of the international scientific community in applying RL methods to environmental sustainability problems over the last 20 years. As shown in Fig. 1, the number of publications (pink dots) remained relatively low until 2018, with the number of publications each year less than five. Since 2019, there has been a rapid growth of up to 53 papers in 2022, showing the increasing interest in this topic during the last few years. It is important to notice that the data for the year 2023 are updated to April 2023 and do not represent a decrease in the number of studies published. Application of inclusion and exclusion criteria leads to no publication in the years 2004, 2005, 2010, and 2011. In Fig. 1, we also show that the increase in the number of publications fits an exponential pattern (green line) with a growth rate of 0.42 in the number of publications (from 2 papers in 2007 to 53 in 2022). To compute the regression model, we do not consider 2023 in the model since its information is partial.

Table 1 Journals and conferences with at least two publications

Authors	Title	Source title	Year	Database
Sultanuddin S.J., Vibin R., Rajesh Kumar A., Behera N.R., Pasha M.J., Baseer K.K.	Development of improved reinforcement learning smart charging strategy for electric vehicle fleet	Journal of Energy Storage	2023	S, W
Ajao L.A., Apeh S.T.	Secure edge computing vulnerabilities in smart cities sustainability using petri net and genetic algorithm-based reinforcement learning	Intelligent Systems with Applications	2023	S
Wang J., Sun L.	Robust dynamic bus control: a distributional multi-agent reinforcement learning approach	IEEE Transactions on Intelligent Transportation Systems	2023	S, W
Szoke L., Aradi S., Bécsi T.	Traffic signal control with successor feature-based deep reinforcement learning agent	Electronics (Switzerland)	2023	S, W
Ali M.Y., Alsaeedi A., Shah S.A.A., Yafooz W.M.S., Malik A.W.	Energy efficient data dissemination for large-scale smart farming using reinforcement learning	Electronics (Switzerland)	2023	S, W
Yao R., Hu Y., Varga L.	Applications of agent-based methods in multi-energy systems-a systematic literature review	Energies	2023	S, W
Kazemeini A., Swei O.	Identifying environmentally sustainable pavement management strategies via deep reinforcement learning	Journal of Cleaner Production	2023	S, W
Emamjomehzadeh O., Kerachian R., Emami-Skardi M.J., Momeni M.	Combining urban metabolism and reinforcement learning concepts for sustainable water resources management: A nexus approach	Journal of Environmental Management	2023	S
Charef N., Ben Mnaouer A., Aloqaily M., Bouachir O., Guizani M.	Artificial intelligence implication on energy sustainability in Internet of Things: A survey	Information Processing and Management	2023	S, W
Savazzi S., Rampa V., Kianoush S., Bennis M.	An energy and carbon footprint analysis of distributed and federated learning	IEEE Transactions on Green Communications and Networking	2023	S, W
Naseer F., Khan M.N., Altalbe A.	Telepresence robot with DRL assisted delay compensation in iot-enabled sustainable healthcare environment	Sustainability (Switzerland)	2023	S
Kolat M., Kővári B., Bécsi T., Aradi S.	Multi-Agent reinforcement learning for traffic signal control: a cooperative approach	Sustainability (Switzerland)	2023	S, W
Sivamayil K., Rajasekar E., Aljafari B., Nikolovski S., Vairavasundaram S., Vairavasundaram I.	A systematic study on reinforcement learning based applications	Energies	2023	S, W
Khalid M., Wang L., Wang K., Aslam N., Pan C., Cao Y.	Deep reinforcement learning-based long-range autonomous valet parking for smart cities	Sustainable Cities and Society	2023	S
Gao Y., Chang D., Chen C.-H.	A digital twin-based approach for optimizing operation energy consumption at automated container terminals	Journal of Cleaner Production	2023	S
Badakhshan S., Jacob R.A., Li B., Zhang J.	Reinforcement learning for intentional islanding in resilient power transmission systems	2023 IEEE Texas Power and Energy Conference, TPEC 2023	2023	S
Zhang W., Valencia A., Chang N.	Fingerprint networked reinforcement learning via multiagent modeling for improving decision making in an urban food-energy-water nexus	IEEE Transactions on Systems, Man, and Cybernetics: Systems	2023	S, W
Li C., Bai L., Yao L., Waller S.T., Liu W.	A bibliometric analysis and review on reinforcement learning for transportation applications	Transportmetrica B	2023	S, W
Venkataswamy V., Grigsby J., Grimshaw A., Qi Y.	RARE: renewable energy aware resource management in datacenters	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)	2023	S, W
No author name available [Conference Review]	9th International Conference on Sustainable Design and Manufacturing, SDM 2022	Smart Innovation, Systems and Technologies	2023	S
Koch L., Picerno M., Badalian K., Lee S.-Y., Andert J.	Automated function development for emission control with deep reinforcement learning	Engineering Applications of Artificial Intelligence	2023	S, W
Huo D., Sari Y.A., Kealey R., Zhang Q.	Reinforcement learning-based fleet dispatching for greenhouse gas emission reduction in open-pit mining operations	Resources, Conservation and Recycling	2023	S
Feng Y., Zhang X., Jia R., Lin F., Lu J., Zheng Z., Li M.	Intelligent trajectory design for mobile energy harvesting and data transmission	IEEE Internet of Things Journal	2023	S, W
Chen M., Li Y., Zhang X., Liao R., Wang C., Bi X.	Optimization of river environmental management based on reinforcement learning algorithm: a case study of the Yellow River in China	Environmental Science and Pollution Research	2023	S, W
Gu Z., Liu Z., Wang Q., Mao Q., Shuai Z., Ma Z.	Reinforcement learning-based approach for minimizing energy loss of driving platoon decisions	Sensors	2023	W
Baba-Nalikant M., Syed-Mohamad S. M., Husin M. H., Abdullah N. A., Saleh M. S. M., Rahim A. A.	A zero-waste campus framework: perceptions and practices of university campus community in Malaysia	Recycling	2023	W
Daradkeh M.	Lurkers versus contributors: an empirical investigation of knowledge contribution behavior in open innovation communities	Journal of Open Innovation: Technology, Market, and Complexity	2022	S
Jendoubi I., Bouffard F.	Data-driven sustainable distributed energy resources’ control based on multi-agent deep reinforcement learning	Sustainable Energy, Grids and Networks	2022	S
Tomin N., Shakirov V., Kurbatsky V., Muzychuk R., Popova E., Sidorov D., Kozlov A., Yang D.	A multi-criteria approach to designing and managing a renewable energy community	Renewable Energy	2022	S, W
Adetunji K.E., Hofsajer I.W., Abu-Mahfouz A.M., Cheng L.	A novel dynamic planning mechanism for allocating electric vehicle charging stations considering distributed generation and electronic units	Energy Reports	2022	S
Li R., Zhang X., Jiang L., Yang Z., Guo W.	An adaptive heuristic algorithm based on reinforcement learning for ship scheduling optimization problem	Ocean and Coastal Management	2022	S, W
Zhang W., Xie M., Scott C., Pan C.	Sparsity-aware intelligent spatiotemporal data sensing for energy harvesting IoT system	IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems	2022	S, W
Yao L., Leng Z., Jiang J., Ni F.	Large-scale maintenance and rehabilitation optimization for multi-lane highway asphalt pavement: a reinforcement learning approach	IEEE Transactions on Intelligent Transportation Systems	2022	S, W
Adetunji K.E., Hofsajer I.W., Abu-Mahfouz A.M., Cheng L.	An optimization planning framework for allocating multiple distributed energy resources and electric vehicle charging stations in distribution networks	Applied Energy	2022	S, W
Mahmud S., Abbasi A., Chakrabortty R.K., Ryan M.J.	A self-adaptive hyper-heuristic based multi-objective optimisation approach for integrated supply chain scheduling problems	Knowledge-Based Systems	2022	S, W
Musaddiq A., Ali R., Kim S.W., Kim D.-S.	Learning-based resource management for low-power and lossy IoT networks	IEEE Internet of Things Journal	2022	S
Giri M.K., Majumder S.	Deep Q-learning based optimal resource allocation method for energy harvested cognitive radio networks	Physical Communication	2022	S
Selukar M., Jain P., Kumar T.	Inventory control of multiple perishable goods using deep reinforcement learning for sustainable environment	Sustainable Energy Technologies and Assessments	2022	S, W
No author name available [Conference Review]	IFAC Workshop on Control for Smart Cities, CSC 2022 - Proceedings	IFAC-PapersOnLine	2022	S
Alibabaei K., Gaspar P.D., Assunção E., Alirezazadeh S., Lima T.M., Soares V.N.G.J., Caldeira J.M.L.P.	Comparison of on-policy deep reinforcement learning A2C with off-policy DQN in irrigation optimization: a case study at a site in portugal	Computers	2022	S, W

Reinforcement learning applications in environmental sustainability: a review

Abstract

Similar content being viewed by others

Exploring the Landscapes and Emerging Trends of Reinforcement Learning from 1990 to 2020: A Bibliometric Analysis

Optimizing Urban Design for Pandemics Using Reinforcement Learning and Multi-objective Optimization

Multi-objective reinforcement learning for designing ethical multi-agent environments

Explore related subjects

1 Introduction

2 Related work

3 Reinforcement learning: preliminaries and main definitions

4 Review methodology

5 Results of the review

5.1 Analysis of the initial set of 181 papers

5.1.1 RQ1: How many academic studies have been published from 2003 to 2023 about RL for environmental sustainability?

5.1.2 RQ2: What were the most relevant publication channels used?

5.1.3 RQ3: In which country were located the most active research centers?

5.1.4 RQ4: What were the application domains and the methodologies used?

5.2 Analysis of the 35 main papers

5.2.1 RQ4: What were the application domains and the methodologies used?

5.2.2 RQ5: How was the RL problem formalized (i.e., type of state/action space, type of transition model, and type of dataset used)?

5.2.3 RQ6: Which evaluation metrics were used to assess the performance?

5.2.4 RQ7: What were the challenges addressed?

5.3 Analysis of single papers (grouped by application domain)

5.3.1 Electric vehicles, Batteries, Energy

5.3.2 IoT

5.3.3 Water resources

5.3.4 Emissions/pollution

5.3.5 Agriculture

5.3.6 Data, energy

5.3.7 Urban traffic and transportation

5.3.8 Buildings

5.3.9 Manufacturing

5.3.10 Mobile and wireless communication

5.3.11 Electric energy

5.3.12 Energy

5.3.13 Wireless sensor network

5.3.14 Autonomous vehicles

6 Discussion

7 Conclusions

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Appendices

Keywords mapping into macro areas

Database search results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation