Abstract
The challenge of spatial resource allocation is pervasive across various domains such as transportation, industry, and daily life. As the scale of real-world issues continues to expand and demands for real-time solutions increase, traditional algorithms face significant computational pressures, struggling to achieve optimal efficiency and real-time capabilities. In recent years, with the escalating computational power of computers, the remarkable achievements of reinforcement learning in domains like Go and robotics have demonstrated its robust learning and sequential decision-making capabilities. Given these advancements, there has been a surge in novel methods employing reinforcement learning to tackle spatial resource allocation problems. These methods exhibit advantages such as rapid solution convergence and strong model generalization abilities, offering a new perspective on resolving spatial resource allocation problems. Despite the progress, reinforcement learning still faces hurdles when it comes to spatial resource allocation. There remains a gap in its ability to fully grasp the diversity and intricacy of real-world resources. The environmental models used in reinforcement learning may not always capture the spatial dynamics accurately. Moreover, in situations laden with strict and numerous constraints, reinforcement learning can sometimes fall short in offering feasible strategies. Consequently, this paper is dedicated to summarizing and reviewing current theoretical approaches and practical research that utilize reinforcement learning to address issues pertaining to spatial resource allocation. In addition, the paper accentuates several unresolved challenges that urgently necessitate future focus and exploration within this realm and proposes viable approaches for these challenges. This research furnishes valuable insights that may assist scholars in gaining a more nuanced understanding of the problems, opportunities, and potential directions concerning the application of reinforcement learning in spatial resource allocation.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Conventional methods used to solve spatial resource allocation problems (Eyles et al., 1982; Long et al., 2018) involve planners making effective allocations of the determined spatial resources with few reasonable parameters. Scientific and rational spatial resource allocation parameters can effectively reduce resource operation costs and improve its utilisation and output efficiency. For instance, in the issue of allocating freight warehouse space, the application of scientific and reasonable parameters can enhance the utilization rate of the warehouse capacity, expedite the flow of goods (Azadivar, 1989; Sanei et al., 2011). The successes of such objectives depend on the effective planning of resource allocation. For enterprise planning, the first thing to consider is the scientific and reasonable layouts of the factory area to ensure their logistics and information flows run smoothly (Benjaafar et al., 2002; Naranje et al., 2019). When planning public-service facilities, it is essential to consider construction and maintenance costs and service coverages to reduce long distances and waiting time for users to obtain good services (Suchman, 1968; Wang et al., 2021). Overall, the demands for effective resource allocation planning render the spatial resource allocation problem a typical example of non-deterministic polynomial-time hardness (NP-hardness) (Hochba, 1997). This complexity arises from challenges in decision-making environments, specification descriptions, intertwined spatial structures, and scale effects that inhibit the search for high-quality solutions.
Numerous studies have explored the complex computational challenges inherent in spatial resource allocation optimization, utilizing both exact and heuristic approaches. Exact algorithms in spatial resource allocation typically define a comprehensive set of decision variables, constraints, and objective functions to optimize the allocation, minimize costs, enhance efficiency, or fulfill predefined service levels. These algorithms encompass Linear Programming (Schrijver, 1998), Mixed Integer Linear Programming (Floudas & Lin, 2005), Dynamic Programming (Bellman, 1966), among others, each tailored to address distinct facets of spatial resource allocation conundrums. For instance, in tackling facility location issues, approaches like Branch and Cut (Parragh et al., 2022), Lagrangian Relaxation (Cabezas & García, 2023), and Column Generation (Etebari, 2019) are employed to ensure coverage of each demand point within a designated service radius.The principal obstacle with exact algorithms is their scalability: as the size and complexity of the problem swell, the time required to find a solution tends to grow exponentially. In real-world scenarios that involve extensive spatial datasets and intricate constraints, exact algorithms may fall short in furnishing a viable solution in a timely manner. For instance, in the delineation of service areas, certain exact algorithms that address diverse constraints such as connectivity, balance, and compactness (Li et al., 2007; Nemoto & Hotta, 2003) may not account for all constraints comprehensively. Even algorithms capable of satisfying multiple constraints like continuity and balance may prove impractical outside of limited-scale applications (Kong et al., 2017; Salazar-Aguilar et al., 2011). Furthermore, the intrinsic heterogeneity of spatial data and the pronounced non-linearity of the associated problem models often present formidable hurdles for conventional exact algorithms.
Heuristic algorithms present a viable and efficient alternative to precise methods for solving large and highly complex problems. They leverage intuition or empirical guidelines to quickly identify feasible or suboptimal solutions. In spatial resource allocation, their variety and adaptability enable the management of diverse constraints and objectives. While not ensuring optimal outcomes, heuristics provide expedient solutions for extensive datasets. Representative heuristic algorithms include Greedy algorithms (Long et al., 2018), Local Search (Johnson et al., 1988), Simulated Annealing (Murray & Church, 1996), Genetic Algorithms (Juels & Wattenberg, 1995), Ant Colony Optimization (Chaharsooghi & Kermani, 2008), and Particle Swarm Optimization (Gong et al., 2012), among others. Each of these employs a unique approach to navigating the search space and often finds good solutions with less computational effort than exhaustive search methods. For instance, the AZP algorithm (Openshaw, 1977) sought to iteratively refine solutions while maintaining spatial connectivity, yet tended to get stuck in local optima. To address this, methods like tabu search (Baldassarre et al., 2023; Duque & Church, 2004), simulated annealing (Jiang et al., 2023; Ko et al., 2015), and ant colony optimization (Mimis et al., 2012) were introduced to enhance solution diversity and partitioning. Additionally, tree-based heuristic (Assunção et al., 2006; Côme, 2024; Guo, 2008) approaches have been developed to simplify neighborhood searches, structuring spatial units into connected graphs for improved partitioning, all while preserving connectivity. Heuristics are extensively used in spatial resource allocation; however, their efficiency can be hampered by large, high-dimensional, or complex problem structures, potentially leading to unsatisfactory solutions. Their effectiveness also heavily relies on the input of expert knowledge and experience.
With the continuous development of information technology and the increasing popularity of information storage devices, industries have been accumulating large amounts of data, e.g. their taxi trajectories (Al-Dohuki et al., 2016; Liu et al., 2019), taxi orders (Tong et al., 2021; Zhang et al., 2017), and Point of Interest data (Liu et al., 2013; Yuan et al., 2013). The key and open question that requires further research is how to make full use of such data and discover their rules and strategies to effectively solve the spatial resource optimisation problems because the conventional approaches fail to provide the results with more promising insights.
In recent years, reinforcement learning (RL) methods have made breakthroughs in games (Kaiser et al., 2019; Lample & Chaplot, 2017; Littman, 1994), Go (Bouzy & Chaslot, 2006; Silver et al., 2018; Silver et al., 2007), autonomous driving (J. Chen et al., 2019a, 2019b; Kiran et al., 2021; Sallab et al., 2017; Shalev-Shwartz et al., 2016), robot control (Brunke et al., 2022; Johannink et al., 2019; Kober et al., 2013), pedestrian evacuation route planning (Li et al., 2024; Mu et al., 2023; Xu et al., 2020; Xu et al., 2021), among others. It has become a new research boom in the era of artificial intelligence and has also brought new opportunities to optimise spatial resource allocation (Barto & Sutton, 1997; Feriani & Hossain, 2021). Reinforcement learning can achieve nearly real-time decision-making since its training to generate effective models can be performed offline (Levine et al., 2020; Ramstedt & Pal, 2019; Skordilis & Moghaddass, 2020). It also doesn't need to model the scene logic, and it can learn the experience of data interactions and gradually discover their rules and strategies to obtain effective models (Degris et al., 2012; Strehl et al., 2006). Moreover, when combined with deep learning methods, reinforcement learning provides more ability for large-scale data processing and discovering and extracting their low-level features providing efficient results (Arulkumaran et al., 2017; Li, 2017). In general, all such characteristics have made reinforcement learning more appropriate and robust for handling spatial resource allocation problems with big data.
This paper aims to conduct a comprehensive investigation into the application of reinforcement learning in the field of spatial resource allocation. To highlight the differences in characteristics and optimization objectives among various application scenarios, we categorize the applications of reinforcement learning in spatial resource allocation into three major classes: a) static demand resource allocation, b) static resource allocation, and c) dynamic resource allocation. In Chapter 2, we provide an introduction to the basic concepts and algorithms of reinforcement learning and deep reinforcement learning, elucidating the advantages of their application in spatial resource allocation. Chapters 3 to 5 review the latest research progress in the application of reinforcement learning within the three aforementioned categories, considering the distinct characteristics and optimization objectives of spatial resource allocation scenarios. Chapter 6 provides an overview of the challenges encountered when applying reinforcement learning to spatial resource allocation, as well as potential solutions, aiming to offer insights for future research directions. Finally, Chapter 7 summarizes and concludes the paper. The content structure of this review is depicted in Fig. 1.
2 Literature review
Artificial intelligence has achieved unprecedented development with the advent of the era of big data and the continuous improvement of computer computational power (Duan et al., 2019; O'Leary, 2013). From the initial Turing test (Moor, 1976; Pinar Saygin et al., 2000) to the subsequent conception of artificial intelligence, scientists have been working hard to make computers or robots have "intelligence" and be able to learn to observe and act according to "consciousness" like humans. Artificial intelligence has even surpassed humans in tasks such as big data analysis (Kibria et al., 2018; Zhu, 2020), chess (Hassabis, 2017; Schrittwieser et al., 2020), disease diagnosis (Kumar et al., 2022; Shen et al., 2019) and video games (Perez-Liebana et al., 2016; Skinner & Walmsley, 2019). AI technology can also be widely used in other applications, such as weather forecasting (Anđelković & Bajatović, 2020; Baboo & Shereef, 2010; Liu et al., 2014), material design (Feng et al., 2021; Guo et al., 2021), recommendation systems (Verma & Sharma, 2020; Zhang et al., 2021), machine perception and control (Chalmers et al., 1992; Wechsler, 2014), autonomous driving (Atakishiyev et al., 2021), face recognition (Beham & Roomi, 2013; Sharma et al., 2020), speech recognition (Al Smadi et al., 2015; Nassif et al., 2019), and dialogue systems (Deriu et al., 2021; Ni et al., 2023). An Artificial intelligence system needs to have the ability to learn from raw data, which Arthur Samuel (Samuel, 1959) calls Machine Learning. The usual process for Artificial intelligence to solve problems is to design targeted pattern recognition algorithms to extract valid features from raw data and then used in conjunction with machine learning algorithms. Machine learning can be divided into supervised learning (Nasteski, 2017), unsupervised learning (Alloghani et al., 2020), and reinforcement learning (Sutton, 1992). Among the key differences between reinforcement learning and others is that reinforcement learning is a self-supervised learning method (Xin et al., 2020). On one hand, the agent undergoes training based on action and reward data, optimizing its action strategies. On the other hand, it autonomously interacts with the environment, receiving feedback based on the outcomes of state transition. Currently, reinforcement learning has demonstrated exceptional performance across various domains including robot control (Brunke et al., 2022), path planning (Panov et al., 2018), video game (Jaderberg et al., 2019), autonomous driving (Kiran et al., 2021), and more.
2.1 Reinforcement learning
2.1.1 Algorithm composition
The main body of reinforcement learning comprises two parts, the agent and the environment. It does not require supervised signals for learning but instead relies on the agent's feedback reward signal in the environment. The state and actions of the agent are adjusted according to the feedback signal so that the agent can gradually maximise the reward. Finally, reinforcement learning can have a strong self-learning ability. A standard reinforcement learning algorithm consists of four elements: policy function, reward function, value function, and an environment model.
2.1.2 Algorithm framework
In reinforcement learning, when the agent takes a particular action, it doesn't always lead to a specific state. Generally, the likelihood of transitioning to a state after an action is represented through a state transition model. The probability of the environment transitioning to the subsequent state within the actual environment process depends on multiple prior environment states. To streamline the environment's state transition model, reinforcement learning assumes that the probability of moving to the next state solely relates to the preceding state. Consequently, the entire reinforcement learning process can be simplified into a Markov Decision Process (Sutton et al. 1999), which serves as the fundamental framework for reinforcement learning. A Markov Decision Process is represented by a five-tuple \(<S,A,P,R,\gamma >\), where each element signifies: \(S\) represents the set of states, \(A\) represents the set of actions, \(P\) is the state transition probability function, \(R\) represents the reward function, \(\gamma\) represents the discount factor.
Reinforcement Learning operates as a Markov Decision Process where an agent learns to make decisions aimed at achieving specific goals through interactions with its environment. In this process, the agent observes environmental states, chooses actions guided by a particular strategy, and receives corresponding rewards or penalties from the environment. The agent's goal is to maximize long-term rewards by experimenting with diverse strategies. The following Fig. 2 illustrates the core process of reinforcement learning.
2.2 Category
Based on current research, reinforcement learning algorithms are categorized based on diverse criteria such as function-based approaches, and the number of agents in the environment. Function-based classification groups reinforcement learning algorithms into three categories: value-based, policy-based, and Actor-Critic methods used to update learning policies. Concerning the number of agents in the environment, reinforcement learning algorithms are classified into single-agent and multi-agent categories. In a multi-agent system where multiple agents interact with the environment, each agent still pursues its reinforcement learning objective. The alteration in the environment's overall state relates to the collective actions of all agents. Therefore, considering the impact of joint actions becomes crucial in the process of agent policy learning. Table 1 presents a summarizing display of the categorization of reinforcement learning algorithms, including the principles, representative algorithms, advantages, and limitations of various types of reinforcement learning algorithms.
2.2.1 Value-based RL
The value-based reinforcement learning algorithm implicitly shapes the optimal policy by deriving the optimal value function and selecting actions that correspond to maximum value functions. Notable algorithms in this category include Q-learning (Watkins & Dayan, 1992), SARSA (Zhao et al., 2016), and Deep Q-Network (DQN) (Fan et al., 2020). Deep Q-Network merges the Q-learning algorithm with a deep neural network, commonly employing Deep Neural Network or Convolutional Neural Networks to build a model and the Q-learning algorithm for training. This method effectively addresses the computational inefficiencies and limitations of data memory concerns of Q-learning. However, due to the overestimation drawbacks in DQN, various optimization algorithms emerged, such as Deep Double Q-learning Network (DDQN) (Van Hasselt et al., 2016), Dueling DQN (Wang et al., 2016), DQN algorithms with dynamic frame skipping (Srinivas et al., 2016), Prioritized Experience Replay (PER) (Schaul et al., 2015), Noisy DQN (Fortunato et al., 2017), Distributional DQN (Dabney et al., 2018), Rainbow DQN (Hessel et al., 2018), etc. Although value-based algorithms offer benefits like high sample efficiency, low variance in estimates of the value function, and resilience to local optima, they commonly struggle with continuous action space problems.
2.2.2 Policy-based RL
Given the limitations of reinforcement learning methods based on the value function concerning continuous action space parameters and stochastic policy issues, researchers have proposed various policy-based reinforcement learning approaches. In policy-based algorithms, the agent directly produces the probability of potential actions for the subsequent time step and selects actions based on these probabilities. These algorithms parameterize the policy, utilizing the expected cumulative return as the objective function, and optimize this function through gradient policy methods (Silver et al., 2014). The stochastic policy search method learns the parameterized policy directly based on policy gradients. It bypasses the need to solve the action space value maximization optimization problem, making it more suitable for addressing high-dimensional or continuous action space problems. Notable algorithms in this category include REINFORCE (Williams, 1992), Trust-PCL (Nachum et al., 2017), etc. In contrast to random strategies, deterministic strategies determine an action uniquely for a specific state. Representative algorithms encompass Deterministic Policy Gradient (DPG) (Srinivas et al., 2016), Deep Deterministic Policy Gradient (DDPG) (Lillicrap et al., 2015), TD3 (Lillicrap et al., 2015), and so on. However, policy-based reinforcement learning exhibits certain drawbacks: it can be computationally intensive and entails extended iteration times in addressing complex problems.
2.2.3 Actor-Critic RL
The Actor-Critic algorithm combines aspects of both value-based and policy-based reinforcement learning methods. Its architecture involves two key components: the Actor, which employs policy methods to approximate the policy model by generating actions and interacting with the environment, and the Critic, which employs value methods to assess the advantages and disadvantages of actions and approximates the value function. Subsequently, the Actor optimizes the action probability function based on the Critic's evaluations, guiding the agent to choose optimal actions. This approach conducts policy evaluation and optimization using the value function while refining the policy function to enhance the accuracy of state value representation. These intertwined processes converge to derive the optimal policy. In recent years, several Actor-Critic algorithms have emerged, including Advantage Actor-Critic (A2C) (Grondman et al., 2012), Asynchronous Advantage Actor-Critic (A3C) (Mnih et al., 2016), Trust Region Policy Optimization (TRPO) (Schulman et al., 2015), Proximal Policy Optimization (PPO) (Schulman et al., 2017), Distributed PPO (Zhang et al., 2019) and Soft Actor-Critic (Haarnoja et al., 2018), etc. However, adjusting the parameters of the Actor-Critic algorithm can be challenging, and the consistency of results when applied may not be guaranteed.
2.2.4 Multi-agent RL
Complex real-world scenarios often demand collaboration, communication, and confrontation among multiple agents, such as production robots (Giordani et al., 2013), urban traffic lights (Wu et al., 2020), E-commerce platform (Rosaci & Sarnè, 2014), all of which constitute typical multi-agent systems.The basic idea of multi-agent reinforcement learning is the same as that of single-agent reinforcement learning. It is to let the agent interact with the environment and then learn to improve its strategy according to the reward value obtained to obtain the situation in the environment. However, in contrast to single-agent environments, multi-agent settings are notably more intricate. The state space and the connectivity of action space in these environments grow exponentially with the number of agents involved. Each agent must navigate this dynamic environment, involving interactions with other agents, thereby intensifying the learning complexity. Additionally, as multiple agents engage with the environment concurrently during the learning phase, the actions of one agent can instigate alterations within the environment, thereby influencing its own decision updates. Furthermore, the influence of other agents on the environment perpetually impacts subsequent decisions. The evolution of multi-agent reinforcement learning is deeply intertwined with game theory, considering its resemblance to a stochastic or Markov game in the learning process.
Classic multi-agent reinforcement learning algorithms based on game theory include Nash Q-Learning (Hu & Wellman, 2003),Team Q-learning (Cassano et al., 2019), Minimax-Q-Learning (Zhu & Zhao, 2020), Friend-or-Foe Q-learning (Littman, 2001), etc. The development of deep learning has broken through the characteristics of small-scale and simple problems applicable to classical multi-agent reinforcement learning algorithms. Through the extension or improvement of the single-agent reinforcement learning algorithm, the researchers propose Independent Double Deep Q-Network (IDDQN) (Wang et al., 2022), Mean Field Multi-Agent Reinforcement Learning (MFMARL) (Yang et al., 2018), and Deterministic Policy Gradient (MADDPG) (Lowe et al., 2017), etc. The advantage of multi-agent reinforcement learning lies in handling complex interactive environments, yet it faces challenges related to high-dimensional state spaces and instability.
2.3 Potentiality of applying reinforcement learning to spatial resource allocation
The problem of spatial resource allocation is that planners allocate spatial resources reasonably and effectively according to the current spatial allocation of resources, the set constraints and the goals to be achieved. Scientific and rational allocation of spatial resources can effectively improve the utilisation efficiency of resources, reduce the operating cost of facilities, and improve the efficiency of output. The problem of spatial resource allocation is a typical NP-Hard problem, considering multiple challenges such as complex problem specification description and interleaved spatial structure. The potential application of reinforcement learning in spatial resource allocation stems from its fundamental principle of learning through the interaction between an agent and its environment. In resource allocation problems, the agent (representing decision-makers or systems) needs to make a series of decisions based on the environmental state to achieve optimal resource utilization.
In the context of spatial resource allocation problems, the state may encompass information regarding the location, quantity, availability of resources, and environmental attributes related to resource allocation. Actions refer to the decisions made by an agent concerning resource allocation, such as determining which areas or locations resources should be allocated to. Rewards typically assess the agent's decision-making based on resource utilization efficiency, coverage scope, cost savings, or other specific objectives. Through continual interaction and experimentation with the environment, the agent adjusts its behavioral strategies based on received reward signals, progressively learning the optimal resource allocation scheme. This learning process involves striking a balance between exploration and exploitation; the agent needs to explore novel decision choices to discover better solutions while leveraging existing knowledge to enhance resource utilization efficiency.
Therefore, based on the principles of reinforcement learning, employing this technique to address various spatial resource allocation problems becomes feasible. Reinforcement learning agents optimize resource allocation strategies incrementally through interactions with the environment, adapting to evolving environmental conditions and demands for more effective resource utilization.
3 Reinforcement learning in Static demand-based resource allocation
In this scenario, the state of demand points remains fixed and static, while the location of resource points is selectable, representing a dynamic allocation focused on resource points. Real-world examples include energy distribution (Li et al., 2023), healthcare resource allocation (Xi et al., 2023), and educational resource distribution (Glewwe et al., 2021), which can be categorized into two main groups: facility location problems and service coverage problems. These scenarios aim to optimize the spatial and temporal distribution of resource demand and consumption within a confined space. The objective is to ensure consistency in meeting resource demands and utilization across both time and space, endeavoring to extend resource service provision to a broader set of demand points whenever possible, thereby maximizing the utilization of available resources. Relevant papers tackling these tasks are summarized in Table 2.
3.1 Facility location
For facility location selection, both value-based and policy-based reinforcement learning algorithms are used. Petratos et al. (2021) and von Wahl et al. (2022) used the DQN algorithm to optimise the location of charging stations in cities. Leonie et al. used the charging stations' location, charging demand, and facility cost as state functions and set the action space of the agent as create, increase, and relocate. The reward of agent is related to the construction cost of the charging bin and output income. Petratos et al. used a supervised gradient to improve the model's accuracy, describing the state as a 2D grid model containing the predicted demand value. They used placing a charger in one grid cell as the action space of agent, sum of the expected demand of a hypothetical charger in a grid cell, and the area covered by the charger as the reward factor. Liu et al. (2023) proposed the policy-based PPO-Attention model, which integrates an attention mechanism into the PPO algorithm, enhancing the algorithm's ability to recognize and comprehend the intricate interdependencies among different nodes within the network. Zhao et al. (2023) proposed a recurrent neural network with an attention mechanism to learn model parameters and determine the optimal strategy in a completely unsupervised manner.Further research attempts to solve facility location selection by defining them as MDP and incorporating Graph Neural Networks and attention mechanisms, utilizing DRL (Chen et al., 2023; Liang et al., 2024; Zhong et al., 2024). In these models, each facility site is considered a node with identical attributes, and the model, when selecting sites, comprehensively considers the distribution relationships, order, distribution, and coverage information between nodes. Figure 3 shows an abstract facility location case by reinforcement learning.
In optimizing the coverage range of Unmanned Aerial Vehicle (UAV) base stations to meet diverse ground user needs, Gopi et al. (2021) proposed a reinforcement learning algorithm based on Q-learning to guide the selection of UAV base stations. This study restricts base station movement to grid points in a square grid, reducing the number of states and allowing drone base stations to move in fixed incremental distances to maintain a consistent distance between stations. Bhandarkar et al. (2022) utilized DQN algorithm to select UAV base station locations more reasonably. They used the UAV's position and covered users as states, while the number of newly covered users, user coverage, and whether the UAV exceeded boundaries were treated as rewards. By comparing reward-based greedy algorithm with DQN algorithm, this study concludes that DQN algorithm outperforms greedy algorithm in both coverage and latency performance.
3.2 Wireless sensor network
When exploring the application of reinforcement learning in resource coverage optimization within wireless sensor networks, Nie & Haykin (1999) and El-Alfy et al. (2006) utilized real-time Q-learning combined with neural network representations to address dynamic wireless resource allocation problems. Concurrently, Seah et al. (2007) and Renaud et al. (2006) employed distributed reinforcement learning algorithms such as IL, DVF, COORD, optimistic DRL, and FMQ to optimize the resource coverage and energy consumption in wireless sensor networks. Additionally, Tamba et al. (2021) utilized unmanned aerial vehicles as sensing components within a mobile wireless sensor network, establishing a three-dimensional grid environment and deploying the wireless sensor network as a whole to perform the resource coverage task within a target area.
Addressing coverage gaps in wireless sensor networks, researchers have also employed reinforcement learning for coverage hole repair. Hajjej et al. (2020) proposed a distributed reinforcement learning approach where each intelligent agent selected a combination of node relocation and perception range adjustment actions. Simulation results demonstrate that the proposed method can sustain overall network coverage in the presence of random damage events. Furthermore, Philco et al. (2021) initially employed a multi-objective black widow optimization algorithm and Tsallis entropy-enabled Bayesian probability algorithm for dynamic sensor node scheduling. Subsequently, they utilized a multi-agent SARSA to determine the optimal mobile node for repairing coverage holes.
3.3 Unmanned Aerial Vehicle coverage
The coverage problem for UAVs doesn't necessitate serving all demands but rather expanding the coverage as much as possible within a certain number of service facilities. Liu et al. (2018) proposed DRL-EC3 based on DDPG, considering communication coverage, fairness, energy consumption, and connectivity, guided by two deep neural networks. Wang et al. (2019a, 2019b) utilized the DDQN algorithm to determine UVA positions. The model used UVA positions as the state function, with action spaces comprising five directional actions, and reward functions based on coverage and capacity ratios, achieving optimal real-time capacity by altering UVA allocation positions with moving ground terminals. Xiao et al. (2020) proposed a distributed dynamic area coverage algorithm based on reinforcement learning and γ information graph. The γ information graph can transform the continuous dynamic coverage process into a discrete γ point traversal process while ensuring no hole coverage. When agent communication covers the entire target area, agent can obtain the global optimal coverage strategy by learning the entire dynamic coverage process. When the communication does not cover the entire target area, the agent can obtain a locally optimal coverage strategy. Bromo et al. (2023) presented PPO method for UAV fleet coverage planning in unexplored areas with obstacles, aiming to reduce steps and energy needed for full coverage while avoiding collisions.
There are also researchers using multi-agent reinforcement learning to solve this problem, Pham et al. (2018) addressed the UVA service environment using a 3D grid, defining UAV state as its approximate location in the environment. They applied the multi-agent Q-learning algorithm to comprehensively cover service resources in unknown areas of interest while minimizing overlapping fields of view. Meng et al. (2021) introduced the MADDPG algorithm, treating multiple UVAs as agents, considering UVA service coverage rate and location as states, and setting travel distances and directions as agent actions, ensuring good dynamic coverage while maintaining agent group connectivity.
4 Reinforcement learning in Static resource-based resource allocation
In this scenario, the state of resource points remains static, while demand points are subject to dynamic changes. Real-world scenarios of static resource allocation include land use planning (Y. Wang et al., 2023a, 2023b), water resource allocation (Pan et al., 2023), and natural reserve management (Shi et al., 2023), among others. These scenarios are based on the principles of partitioning service spaces and scheduling services, primarily focusing on the rational partitioning or arrangement of demand areas to maximize the service efficiency of resources within a limited space. Relevant papers tackling these tasks are summarized in Table 3.
4.1 Service area partition
Klar attempted to solve factory layout planning problems using reinforcement learning, proposing different state spaces and reward setups for various optimization objectives. Initially, they used the DDQN to address a layout scenario involving four functional units, optimizing for transportation time (Klar et al., 2021). Then, for scenarios with numerous functional units, they introduced a novel state representation method combined with action masking, optimizing the action selection process to ensure scalability and reduce training time (Klar, Hussong, et al., 2022a, 2022b). Researchers later focused the state space more on specific details and additional information for placing the next functional unit when optimizing for material flow and energy consumption (Klar, Langlotz, et al., 2022a, 2022b). Additionally, Klar proposed a comprehensive framework for reinforcement learning-based factory layout planning, integrating graph neural networks (Scarselli et al., 2008) with DDQN to enhance feature extraction for states, applicable in both initial factory layout planning and restructuring phases (Klar et al., 2023). Other researchers, including Wang et al. (2020), Di et al. (2021) , and Ribino et al. (2023), utilized reinforcement learning algorithms like PEARL, MCTS, and MORL to propose improved furniture placement solutions in households. In particular, Wang et al. transformed a 3D internal graphics scene into two 2D simulation scenes, establishing a simulation environment where two reinforcement learning agents cooperatively learned the optimal 3D layout through a Markov Decision Process formulation. Kim et al. (2020) addressed the shipyard layout problem to minimize the use of transport aircraft during rearrangements using the A3C algorithm. This research involved two agents in the decision-making process: the transporting agent and the locating agent, each with distinct states, actions, and reward functions. The transporting agent considered stockyard blocks' location and remaining time, choosing the next block arrangement, with the reward function based on the number of blocks moved. Meanwhile, the locating agent dealt with blocks and transporters in the stockyard, deciding whether to move a transporter or carry a block, with the reward based on the success or failure of block export. Figure 4 shows an abstract factory layout planning case by reinforcement learning.
4.2 Service scheduling
The dynamic charging scheduling for electric vehicles aligns with static resource allocation challenges. These strategies select charging stations for vehicles on the move, aiming to cut overall charging times while easing grid pressure. Researchers have used reinforcement learning to propose solutions, varying in state designs, set constraints, and optimization objectives. States typically include Battery State of Charge (Aylor et al., 1992), charging demands, and costs for action selection, crucial for managing distributed energy systems. Optimization often centers on energy and charging costs, attempting to minimize imbalances between generation and consumption. In terms of constraints, the majority of proposed methods utilize energy network constraints to ensure that the objectives of reinforcement learning agents are not the ultimate optimal states but rather the best states practically attainable within the existing electric vehicle charging network. Some studies also define battery parameters such as capacity and charging rates as constraints to provide a genuine representation of battery behavior within the model. Q-learning and its deep extension, DQN, emerge as effective solutions, and the combination of DQN with DDPG can address the challenges posed by high dimensionality and discretization. Additionally, multi-agent reinforcement learning solutions have been proposed for dynamic electric vehicle charging scheduling problems, although their computational costs are high, and convergence poses significant challenges, limiting their practical application. Due to the extensive research methodologies in this field of reinforcement learning, several related review articles are recommended for further exploration (Abdullah et al., 2021; Fescioglu-Unver & Aktaş, 2023; Qiu et al., 2023; Shahriar et al., 2020).
5 Reinforcement learning in Dynamic resource allocation
In this scenario, both resource points and demand points exhibit dynamic variability, where their positions, quantities, or characteristics may change over time. For instance, considering mobile services in urban settings such as traffic flow control (Koch et al., 2023), cloud computing service allocation (Jeyaraj et al., 2023), bike-sharing (DeMaio, 2009), taxi service (Yang & Wong, 1998), or mobile application-based services (Ervasti & Helaakoski, 2010), the positions and quantities of resources might fluctuate in response to user demands, which themselves can vary based on both time and location. In such cases, resource allocation necessitates real-time adjustments based on the dynamic changes in demand and resource availability.
5.1 Taxi order matching
In the taxi order matching problem, since both supply and demand are dynamic, the uncertainty comes from the constant location of the demand point, initial locations of the driver and required travel times. In this resource allocation scenario, the highly dynamic supply and demand position relationship and the acquisition of long-term revenue are complex challenges. Reinforcement learning optimised for this scenario aims to improve the service quality of the resource matching system and the total income of drivers over a long period of time. When researchers use reinforcement learning to build a taxi order matching model, they can be divided into two modes: single-agent training and multi-agent training according to the settings of agents.
5.1.1 Single-agent model
In single-agent model, all agents are defined with the same state, action space, and reward definition. The researchers trained the agent with the experience trajectories of all drivers and applied them to generate matching policies. In this model, although the system is multi-agent from a global perspective, only a single agent is considered in the training phase. The most commonly used state elements include current vehicle location, passenger location, and order details (Al-Abbasi et al., 2019; Holler et al., 2019; Tang et al., 2019; Wang et al., 2018; Xu et al., 2018). These order details include, in addition to existing orders, forecasted demand information derived from forecasting models. For example, Zhou et al. (2023) proposed the ATD3-RO algorithm, which combines adaptive Twin Delayed Deep Deterministic Policy Gradient with robust optimization to perform order prediction in uncertain passenger scenarios. Yang et al. (2021) modelled each demand as an agent and trained a value network to estimate the demand rather than the worker's value, further performing a separate many-to-many matching process based on the learned value.This approach aims to facilitate taxi order matching based on future order predictions even in situations with passenger uncertainties. Additionally, some methods involve environment discretization into grids and utilize graph-based approaches to depict state characteristics (Gao et al., 2018; Haliem et al., 2021; Rong et al., 2016; Verma et al., 2017). In more specific scenarios, such as operations involving new energy taxis, researchers have included the battery storage of vehicles as part of the state (Shi et al., 2019; Tu et al., 2023).
Trip price and profit have become the ultimate optimization objectives in this scenario. However, in setting rewards, researchers consider various factors, such as travel distance, waiting time, and the probability of successful transactions. Some improved algorithms based on Q-learning have been employed in this model. Wang et al. (2019a, 2019b) introduced the dynamic bipartite graph matching (DBGM) problem, taking a holistic system perspective. They trained an agent that encapsulated the entire request list and employed a restricted Q-learning algorithm to optimize decision duration, resulting in near-maximized rewards. Sanket Shah et al. (2020) proposed a Neural Network-based Approximate Dynamic Programming (ADP) framework for a carpooling system, where ADP and neural networks were employed to learn approximate value functions, and DQN was utilized to stabilize the neural ADP. Guo et al. (2020) incorporated vehicle scheduling and route planning, using DDQN to balance passenger service quality against historical data-derived system operating costs. Specifically, the system considers reassigning idle vehicles to optimize vehicle route decisions. Then, through dynamic programming, a vehicle allocation plan is suggested based on learned values from vehicle routing. Wang et al. (2023a, 2023b) addressed passenger transfer issues in ride-sharing scenarios, allowing passengers to transfer between vehicles at transfer stations. By combining DQN with Integer Linear Programming (Verma & Sharma, 2020), they employed ILP to achieve optimal online scheduling and matching strategies for each decision stage, using DQN to learn approximate state values for each vehicle. This combination introduced specific policies to limit state space and reduce computational complexity. Tu et al. (2023) utilized a spatiotemporal NN approach to extract taxi demand patterns, combining this with DDQN to form a Spatiotemporal Double Deep Q Network aiming to maximize daily profits.
The single-agent model simplifies decision-making and reduces computational complexity in taxi order matching, yet it may face limitations due to information constraints, insufficient collaboration, and poor adaptability to environmental changes. While it offers simplified management and computations, it cannot fully leverage information from other taxis, potentially hindering optimal overall efficiency. During significant environmental shifts, such as peak hours or unusual events leading to a surge in orders or changing traffic conditions, the single-agent model might struggle to adapt effectively. Relevant papers tackling these tasks are summarized in Table 4.
5.1.2 Multi-agent model
Since it is difficult to simulate complex interactions between drivers and orders in a single-agent setting, many researchers have also applied multi-agent reinforcement learning (MARL) in such spatial resource order allocation scenarios. The research on MARL for order matching can be categorized into three major classes: based on global feature models, multi-scale models, and model integration.
In methods based on global feature models, researchers focus on capturing overall characteristics and dynamic changes. Li et al. (2019a, 2019b) applied multi-agent reinforcement learning to address distributed features in point-to-point carpooling, capturing global dynamic changes in supply and demand using mean field theory. Zhou et al. (2019) and Zhang et al. (2020) utilized methods like Kullback-Leibler divergence optimization to expedite DDQN learning process and balance the relationship between vehicle supply and order demand.
The multi-scale model methods emphasize multi-layered, multi-scale decision-making processes. Lin et al. (2018) proposed a contextual multi-agent reinforcement learning framework with geographic and collaborative environments, encompassing contextual DQN and contextual multi-agent actor-critic algorithms. This framework enables coordination among numerous agents across diverse contexts. Jin et al. (2019) consider spatial grid units as working agents, grouping sets of these units into managerial agents and employing a hierarchical approach to decision-making. They utilize a multi-head attention mechanism to integrate the influence of neighboring agents and capture crucial agents at each scale.
Model integration methods primarily focus on integrating multiple models or techniques to achieve a more comprehensive and effective decision-making process. This approach aims to leverage the strengths of different models to address complex problems and enhance system performance. Ke et al. (2020) established a two-stage framework involving combinatorial optimization and multi-agent deep reinforcement learning. They dynamically determine the delay time for each passenger request using multi-agent reinforcement learning, while employing combinatorial optimization for optimal binary matching between idle drivers and waiting passengers in the matching pool. Liang et al. (2021) reconstructed the online vehicle scheduling problem by leveraging the topology of heterogeneous transportation networks, using a micro-network representation based on link nodes. They integrated the order scheduling stage with the vehicle routing stage. Singh et al. (2021) developed a multi-agent ride-sharing system using travel demand statistics and deep learning models. Each vehicle in this system makes independent decisions based on its individual impact without coordination with other vehicles. Xu et al. (2023) unified order matching and proactive vehicle repositioning into a unified MDP model, addressing challenges of extensive state spaces and driver competition.
Multi-agent reinforcement learning presents significant advantages in the context of taxi order matching. Its flexibility and adaptability allow the system to dynamically adjust vehicle allocations, making optimal decisions based on real-time traffic and order demands, consequently enhancing overall service efficiency. The collaborative decision-making capacity helps avoid duplicate service areas and facilitates information sharing, optimizing vehicle dispatch and improving system performance. However, this approach encounters challenges: computational complexity requires substantial computing resources, handling large-scale state spaces may become arduous; additionally, issues of convergence, stability within multi-agent systems, and achieving effective communication and collaboration pose challenges, potentially impacting the stability and accuracy of the learning process and decision outcomes. Relevant papers tackling these tasks are summarized in Table 5. Figure 5 shows an abstract multi-agent dynamic order dispatching case by reinforcement learning.
5.2 Delivery Order Matching
Reinforcement learning extends beyond spatial resource matching to courier order matching. To tackle the daily large-scale matching tasks in courier management, Li et al. (2019a, 2019b) initially segmented the city into independent areas where each area had a fixed number of couriers collaborating on package delivery and service requests. They introduced a soft-label clustering algorithm named "Balanced Delivery Service Burden" to distribute packages among couriers within each area. Addressing real-time pickup requests, they proposed a model called "Contextual Cooperative Reinforcement Learning" (CCRL) to guide each courier's short-term delivery and service locations. CCRL was modeled as a multi-agent system emphasizing courier cooperation while considering the system environment. In subsequent research, they introduced the CMARL algorithm aiming to maximize the total completed pickup tasks by all couriers over an extended period (Li et al., 2020). Chen et al. (2019a, 2019b) introduced a framework that utilized multi-layered spatio-temporal maps to capture real-time representations within service areas. They modeled different couriers as multiple agents and used PPO to train corresponding policies, enabling system agents to assign tasks in mobile crowdsourcing scenarios. Zou et al. (2022) presented an Online To Offline (O2O) order scheduler based on DDQN, intelligently assigning orders to couriers based on new order status and the collective courier status. Jahanshahi et al. (2022) explored food delivery services using different variants of DQN for a specific group of couriers to meet dynamic customer demands within a day, paying particular attention to the impact of limited available resources on food delivery. The results offered valuable insights into the courier allocation process concerning order frequencies on specific dates. Relevant papers tackling these tasks are summarized in Table 6.
6 Discussion
While researchers have extensively utilized reinforcement learning to devise algorithms and model frameworks for resolving spatial resource allocation problems across various real-world scenarios, the construction of a more practical framework continues to pose challenges. We aim to briefly outline the primary hurdles encountered when employing reinforcement learning in establishing a framework for spatial resource allocation and explore potential research directions that we believe hold promise in addressing these challenges:
-
1)
When using reinforcement learning to optimize spatial resource coverage and facility location selection, there's been limited consideration for the diversity among finite resources in real-world scenarios. However, existing studies have uniformly set the attributes of each facility within a scenario, contradicting actual application scenarios. In future reinforcement learning research, deeper exploration of multi-agent reinforcement learning algorithms can be conducted to solve this challenge. Such algorithms can consider the interactions, collaboration, and competition among multiple entities, leading to better resource allocation strategies.
-
2)
Reinforcement learning often models the environment as a grid world to optimize spatial resource allocation and sets the agent's actions to move within the surrounding grid. However, such simulations fail to authentically represent the positional relationship between resource supply points and demand points, as most distances in the real environment are defined by roads. By incorporating multi-modal data fusion, reinforcement learning algorithms can more accurately simulate and optimize spatial resource allocation problems, thereby improving the quality of decision-making.
-
3)
Considering multiple constraints in the algorithm design of reinforcement learning for solving order matching problems is an important research area. These constraints include user preferences, time windows, cost calculations, structural constraints between pickup and delivery, and more. Effective training in reinforcement learning requires accommodating these additional constraints. While there are existing solutions like tailored designs for these constraints, they are often seen as soft constraints. However, such solutions aren't suitable for scenarios with strict constraints that cannot be easily violated. Modeling these practical challenges effectively remains a key challenge. By introducing adversarial agents and establishing a competitive model for resource allocation, resource allocation strategies under multiple constraints maybe could be optimized through adversarial learning.
-
4)
The core of reinforcement learning training is feedback rewards. It's a valuable research direction in spatial resource allocation to leverage expert data extensively to enhance learning capabilities and save on learning costs. For sparse reward tasks, setting short-term rewards based on the overlap with expert allocation methods can improve learning efficiency.
-
5)
Combining reinforcement learning with graph neural networks or transfer learning models (Pan & Yang, 2009) offers a potent approach to addressing spatial resource allocation challenges. GNN demonstrate remarkable capabilities in spatial relationship modeling, effectively capturing the spatial correlations between resources and demand points. They can learn node features, aiding in understanding the attributes of resource and demand points and providing deeper insights for devising resource allocation strategies. Meanwhile, transfer learning facilitates knowledge transfer and model generalization. It enables the application of knowledge learned in one environment to another, assisting models in adapting to new resource allocation scenarios, especially in data-scarce situations. This amalgamation combines the decision optimization prowess of reinforcement learning with the spatial sensitivity of graph neural networks, enabling models to better optimize resource allocation strategies, adapt in real-time adjustments, and learn across diverse environments, thereby enhancing decision-making efficiency and accuracy. Overall, this fusion holds the potential to overcome limitations of reinforcement learning in resource allocation, boosting model performance, adaptability, and generalizability.
-
6)
When applying reinforcement learning to spatial resource allocation problems, ethical considerations are crucial, especially in decision-making processes involving fairness, transparency, and accountability. Decisions in fields such as urban planning, traffic management, and the layout of public facilities have long-term impacts on social structures, and it is essential to ensure that the design and execution of reinforcement learning algorithms do not exacerbate inequality. The algorithms must allocate resources without bias, guaranteeing equal rights for community members while ensuring the interpretability and oversight of the decision-making process. Moreover, clear accountability is paramount in addressing the negative effects resulting from unfair decisions. To achieve these goals, interdisciplinary collaboration is indispensable, including but not limited to data scientists, urban planners, sociologists, and ethicists. They must work together to establish ethical guidelines and regulatory frameworks, ensuring that technological advancements do not sacrifice social justice and that technical solutions can promote fairness and equity in all aspects.
-
7)
When designing spatial resource allocation systems based on reinforcement learning, computational and infrastructural requirements must be considered. For data collection, urban environments necessitate real-time data to support decision-making and learning, involving sensor networks, surveillance equipment, and pedestrian tracking, among others. In terms of data processing, feature extraction, preprocessing, and cleaning are required, including techniques such as data normalization, noise reduction, and interpolation. For model deployment, efficient computational and storage architectures need to be designed, utilizing distributed systems and model compression and acceleration technologies to enhance real-time performance and scalability. Additionally, personal privacy and data security must be safeguarded through measures such as encryption, anonymization, and access control.
7 Conclusion
Spatial resource allocation involves distributing finite resources across different locations or regions to meet specific needs or optimize particular objectives. These resources can be various types of facilities, such as charging stations, service centers, transportation nodes, logistics facilities, etc., while demands can arise from daily needs, traffic flows, energy requirements, and more. Key considerations in addressing spatial resource allocation problems encompass the location, quantity, distribution, utilization efficiency of resources, and optimal strategies to fulfill demands. Reinforcement learning, as a potent decision-making and optimization method, offers a new approach and tool for tackling these complex resource allocation issues. In this paper, we conduct a detailed analysis of reinforcement learning methods used in spatial resource allocation problems.
The article divides the spatial resource allocation problem into three categories: static demand-based resource allocation, static resource-based allocation, and dynamic resource allocation. The DQN algorithm, known for solving discrete problems, and its extended algorithms have proven to be the most common and effective approaches for addressing this issue. Both single-agent and multi-agent reinforcement learning algorithms are utilized across various scenarios. Particularly in dynamic resource allocation scenarios, the effectiveness of multi-agent reinforcement learning methods surpasses that of single-agent reinforcement learning algorithms. This is because their flexibility and adaptability enable the system to dynamically adjust resource allocation, making optimal decisions based on evolving conditions, consequently enhancing overall service efficiency. The collaborative decision-making capabilities of multi-agent systems also help in avoiding overlapping resource service areas, promoting information sharing, optimizing vehicle dispatch, and improving system performance. The proportions of different research types and algorithm types are shown in Fig. 6. It can be seen that there is less research on static resource allocation at this stage, and reinforcement learning based on value functions is the most widely studied and used.
In different scenarios, the setup of the environment's state, rewards, and actions varies, which is a crucial step in reinforcement learning solutions. Input features are selected based on the objective of the solution and the specific part of the system to achieve the intended goals. Research indicates that solving different parts of the problem might lead to different solutions. Most papers establish objective functions and multiple constraints based on their own scenarios and available data, including them in the algorithm's positive or negative rewards, to generate more practical and utilitarian models.
When discussing the application of reinforcement learning in solving spatial resource allocation problems, despite its excellent adaptability and decision optimization capabilities, there are also limitations and potential drawbacks to consider. Firstly, reinforcement learning algorithms typically require a large amount of environmental interaction data to train policies, which in the context of spatial resource allocation may imply significant interventions in real-world systems or high precision requirements for simulation environments. Secondly, the convergence and stability of reinforcement learning heavily depend on hyperparameter settings and initial conditions, and incorrect configurations may result in inefficient learning processes or significant differences in policy performance. Additionally, reinforcement learning models in complex environments may face the curse of dimensionality with large state and action spaces, increasing computational complexity. Moreover, how to quickly adapt and effectively generalize to new situations in the face of environmental changes remains a challenge for reinforcement learning. Lastly, compared to traditional algorithms, reinforcement learning results often lack interpretability, which can be a significant obstacle in spatial resource management scenarios that require strict compliance and scrutiny. Therefore, while reinforcement learning shows potential in spatial resource allocation problems, it is essential to fully consider its limitations and the risks specific to the application context.
This article provides a comprehensive review of the application of reinforcement learning in the domain of spatial resource allocation, delving into the practical applicability and potential limitations of the technology. It not only synthesizes existing literature to furnish researchers with an integrated knowledge base but also analyzes key issues encountered when applying reinforcement learning, offering viable directions for addressing these challenges. Furthermore, the paper unveils potential opportunities for reinforcement learning in spatial resource allocation, sparking inspiration for innovative solutions, and setting clear research trajectories for the future. Overall, this study offers valuable insights that can assist researchers in better understanding the challenges, opportunities, and potential directions for the application of reinforcement learning in spatial resource allocation.
Availability of data and materials
Not applicable.
References
Abdullah, H. M., Gastli, A., & Ben-Brahim, L. (2021). Reinforcement learning based EV charging management systems–a review. IEEE Access, 9, 41506–41531.
Al Smadi, T., Al Issa, H. A., Trad, E., & Al Smadi, K. A. (2015). Artificial intelligence for speech recognition based on neural networks. Journal of Signal and Information Processing, 6(02), 66.
Al-Abbasi, A. O., Ghosh, A., & Aggarwal, V. (2019). Deeppool: Distributed model-free algorithm for ride-sharing using deep reinforcement learning. IEEE Transactions on Intelligent Transportation Systems, 20(12), 4714–4727.
Al-Dohuki, S., Wu, Y., Kamw, F., Yang, J., Li, X., Zhao, Y., Ye, X., Chen, W., Ma, C., & Wang, F. (2016). Semantictraj: A new approach to interacting with massive taxi trajectories. IEEE Transactions on Visualization and Computer Graphics, 23(1), 11–20.
Alloghani, M., Al-Jumeily, D., Mustafina, J., Hussain, A., & Aljaaf, A. J. (2020). A systematic review on supervised and unsupervised machine learning algorithms for data science. Supervised and unsupervised learning for data science, 3-21.
Anđelković, A. S., & Bajatović, D. (2020). Integration of weather forecast and artificial intelligence for a short-term city-scale natural gas consumption prediction. Journal of Cleaner Production, 266, 122096.
Arulkumaran, K., Deisenroth, M. P., Brundage, M., & Bharath, A. A. (2017). Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine, 34(6), 26–38.
Assunção, R. M., Neves, M. C., Câmara, G., & da Costa Freitas, C. (2006). Efficient regionalization techniques for socio-economic geographical units using minimum spanning trees. International Journal of Geographical Information Science, 20(7), 797–811.
Atakishiyev, S., Salameh, M., Yao, H., & Goebel, R. (2021). Explainable artificial intelligence for autonomous driving: A comprehensive overview and field guide for future research directions. arXiv preprint arXiv:2112.11561.
Aylor, J. H., Thieme, A., & Johnso, B. (1992). A battery state-of-charge indicator for electric wheelchairs. IEEE Transactions on Industrial Electronics, 39(5), 398–409.
Azadivar, F. (1989). Optimum allocation of resources between the random access and rack storage spaces in an automated warehousing system. The International Journal of Production Research, 27(1), 119–131.
Baboo, S. S., & Shereef, I. K. (2010). An efficient weather forecasting system using artificial neural network. International Journal of Environmental Science and Development, 1(4), 321.
Baldassarre, S., Bruno, G., Diglio, A., & Piccolo, C. (2023). A sample approximation solution procedure for chance-constrained districting problems. Computers & Operations Research, 160, 106376.
Barto, A. G., & Sutton, R. S. (1997). Reinforcement learning in artificial intelligence. In Advances in Psychology (Vol. 121, pp. 358-386). Elsevier.
Beham, M. P., & Roomi, S. M. M. (2013). A review of face recognition methods. International Journal of Pattern Recognition and Artificial Intelligence, 27(04), 1356005.
Bellman, R. (1966). Dynamic programming. Science, 153(3731), 34–37.
Benjaafar, S., Heragu, S. S., & Irani, S. A. (2002). Next generation factory layouts: research challenges and recent progress. Interfaces, 32(6), 58–76.
Bhandarkar, A. B., Jayaweera, S. K., & Lane, S. A. (2022). User Coverage Maximization for a UAV-mounted Base Station Using Reinforcement Learning and Greedy Methods. 2022 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), IEEE, p. 351-356.
Bouzy, B., & Chaslot, G. (2006). Monte-Carlo Go reinforcement learning experiments. 2006 IEEE symposium on computational intelligence and games.
Bromo, C., Godio, S., & Guglieri, G. (2023). Reinforcement Learning based Coverage Planning for UAVs Fleets. AIAA SCITECH 2023 Forum, p. 1149.
Brunke, L., Greeff, M., Hall, A. W., Yuan, Z., Zhou, S., Panerati, J., & Schoellig, A. P. (2022). Safe learning in robotics: from learning-based control to safe reinforcement learning. Annual Review of Control, Robotics, and Autonomous Systems, 5, 411–444.
Cabezas, X., & García, S. (2023). A semi-Lagrangian relaxation heuristic algorithm for the simple plant location problem with order. Journal of the Operational Research Society, 74(11), 2391–2402.
Cassano, L., Alghunaim, S. A., & Sayed, A. H. (2019). Team policy learning for multi-agent reinforcement learning. ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, p. 3062-3066.
Chaharsooghi, S. K., & Kermani, A. H. M. (2008). An effective ant colony optimization algorithm (ACO) for multi-objective resource allocation problem (MORAP). Applied Mathematics and Computation, 200(1), 167–177.
Chalmers, D. J., French, R. M., & Hofstadter, D. R. (1992). High-level perception, representation, and analogy: A critique of artificial intelligence methodology. Journal of Experimental & Theoretical Artificial Intelligence, 4(3), 185–211.
Chen, X., Wang, S., Li, H., Liang, H., Li, Z., & Lu, H. (2023). An attention model with multiple decoders for solving p-Center problems. International Journal of Applied Earth Observation and Geoinformation, 125, 103526.
Chen, J., Yuan, B., & Tomizuka, M. (2019). Model-free deep reinforcement learning for urban autonomous driving. 2019 IEEE intelligent transportation systems conference (ITSC), IEEE, p. 2765-2771.
Chen, Y., Qian, Y., Yao, Y., Wu, Z., Li, R., Zhou, Y., Hu, H., & Xu, Y. (2019). Can sophisticated dispatching strategy acquired by reinforcement learning?-a case study in dynamic courier dispatching system. arXiv preprint arXiv:1903.02716.
Côme, E. (2024). Bayesian contiguity constrained clustering. Statistics and Computing, 34(2), 1–20.
Dabney, W., Ostrovski, G., Silver, D., & Munos, R. (2018). Implicit quantile networks for distributional reinforcement learning. International conference on machine learning.
Degris, T., Pilarski, P. M., & Sutton, R. S. (2012). Model-free reinforcement learning with continuous action in practice. 2012 American Control Conference (ACC), IEEE, p. 2177-2182.
DeMaio, P. (2009). Bike-sharing: History, impacts, models of provision, and future. Journal of Public Transportation, 12(4), 41–56.
Deriu, J., Rodrigo, A., Otegi, A., Echegoyen, G., Rosset, S., Agirre, E., & Cieliebak, M. (2021). Survey on evaluation methods for dialogue systems. Artificial Intelligence Review, 54, 755–810.
Di, X., & Yu, P. (2021). Multi-agent reinforcement learning of 3d furniture layout simulation in indoor graphics scenes. arXiv preprint arXiv:2102.09137.
Duan, Y., Edwards, J. S., & Dwivedi, Y. K. (2019). Artificial intelligence for decision making in the era of Big Data–evolution, challenges and research agenda. International Journal of Information Management, 48, 63–71.
Duque, J. C., & Church, R. L. (2004). A new heuristic model for designing analytical regions. North American Meeting of the International Regional Science Association, Seattle.
El-Alfy, E.-S., Yao, Y.-D., & Heffes, H. (2006). A learning approach for prioritized handoff channel allocation in mobile multimedia networks. IEEE Transactions on Wireless Communications, 5(7), 1651–1660.
Ervasti, M., & Helaakoski, H. (2010). Case study of application-based mobile service acceptance and development in Finland. International Journal of Information Technology and Management, 9(3), 243–259.
Etebari, F. (2019). A column generation algorithm for the choice-based congested location-pricing problem. Computers & Industrial Engineering, 130, 687–698.
Eyles, J., Smith, D. M., & Woods, K. J. (1982). Spatial resource allocation and state practice: the case of health service planning in London. Regional Studies, 16(4), 239–253.
Fan, J., Wang, Z., Xie, Y., & Yang, Z. (2020). A theoretical analysis of deep Q-learning. Learning for dynamics and control.
Feng, R., Yu, F., Xu, J., & Hu, X. (2021). Knowledge gaps in immune response and immunotherapy involving nanomaterials: Databases and artificial intelligence for material design. Biomaterials, 266, 120469.
Feriani, A., & Hossain, E. (2021). Single and multi-agent deep reinforcement learning for AI-enabled wireless networks: a tutorial. IEEE Communications Surveys & Tutorials, 23(2), 1226–1252.
Fescioglu-Unver, N., & Aktaş, M. Y. (2023). Electric vehicle charging service operations: a review of machine learning applications for infrastructure planning, control, pricing and routing. Renewable and Sustainable Energy Reviews, 188, 113873.
Floudas, C. A., & Lin, X. (2005). Mixed integer linear programming in process scheduling: Modeling, algorithms, and applications. Annals of Operations Research, 139, 131–162.
Fortunato, M., Azar, M. G., Piot, B., Menick, J., Osband, I., Graves, A., Mnih, V., Munos, R., Hassabis, D., & Pietquin, O. (2017). Noisy networks for exploration. arXiv preprint arXiv:1706.10295.
Gao, Y., Jiang, D., & Xu, Y. (2018). Optimize taxi driving strategies based on reinforcement learning. International Journal of Geographical Information Science, 32(8), 1677–1696.
Giordani, S., Lujak, M., & Martinelli, F. (2013). A distributed multi-agent production planning and scheduling framework for mobile robots. Computers & Industrial Engineering, 64(1), 19–30.
Glewwe, P., Siameh, C., Sun, B., & Wisniewski, S. (2021). School resources and educational outcomes in developing countries. The Routledge Handbook of the Economics of Education, 218-252.
Gong, Y.-J., Zhang, J., Chung, H.S.-H., Chen, W.-N., Zhan, Z.-H., Li, Y., & Shi, Y.-H. (2012). An efficient resource allocation scheme using particle swarm optimization. IEEE Transactions on Evolutionary Computation, 16(6), 801–816.
Gopi, S. P., & Magarini, M. (2021). Reinforcement learning aided UAV base station location optimization for rate maximization. Electronics, 10(23), 2953.
Grondman, I., Busoniu, L., Lopes, G. A., & Babuska, R. (2012). A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6), 1291-1307.
Guo, D. (2008). Regionalization with dynamically constrained agglomerative clustering and partitioning (REDCAP). International Journal of Geographical Information Science, 22(7), 801–823.
Guo, G., & Xu, Y. (2020). A deep reinforcement learning approach to ride-sharing vehicle dispatching in autonomous mobility-on-demand systems. IEEE Intelligent Transportation Systems Magazine, 14(1), 128–140.
Guo, K., Yang, Z., Yu, C.-H., & Buehler, M. J. (2021). Artificial intelligence and machine learning in design of mechanical materials. Materials Horizons, 8(4), 1153–1172.
Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. International conference on machine learning.
Hajjej, F., Hamdi, M., Ejbali, R., & Zaied, M. (2020). A distributed coverage hole recovery approach based on reinforcement learning for Wireless Sensor Networks. Ad Hoc Networks, 101, 102082.
Haliem, M., Mani, G., Aggarwal, V., & Bhargava, B. (2021). A distributed model-free ride-sharing approach for joint matching, pricing, and dispatching using deep reinforcement learning. IEEE Transactions on Intelligent Transportation Systems, 22(12), 7931–7942.
Hassabis, D. (2017). Artificial intelligence: chess match of the century. Nature, 544, 413–414.
Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double q-learning. Proceedings of the AAAI conference on artificial intelligence.
Hessel, M., Modayil, J., Van Hasselt, H., Schaul, T., Ostrovski, G., Dabney, W., Horgan, D., Piot, B., Azar, M., & Silver, D. (2018). Rainbow: Combining improvements in deep reinforcement learning. Proceedings of the AAAI conference on artificial intelligence.
Hochba, D. S. (1997). Approximation algorithms for NP-hard problems. ACM Sigact News, 28(2), 40–52.
Holler, J., Vuorio, R., Qin, Z., Tang, X., Jiao, Y., Jin, T., Singh, S., Wang, C., & Ye, J. (2019). Deep reinforcement learning for multi-driver vehicle dispatching and repositioning problem. 2019 IEEE International Conference on Data Mining (ICDM).
Hu, J., & Wellman, M. P. (2003). Nash Q-learning for general-sum stochastic games. Journal of Machine Learning Research, 4(Nov), 1039-1069.
Jaderberg, M., Czarnecki, W. M., Dunning, I., Marris, L., Lever, G., Castaneda, A. G., Beattie, C., Rabinowitz, N. C., Morcos, A. S., & Ruderman, A. (2019). Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 364(6443), 859–865.
Jahanshahi, H., Bozanta, A., Cevik, M., Kavuk, E. M., Tosun, A., Sonuc, S. B., Kosucu, B., & Başar, A. (2022). A deep reinforcement learning approach for the meal delivery problem. Knowledge-Based Systems, 243, 108489.
Jeyaraj R, Balasubramaniam AMAAK., Guizani N, & Paul A. (2023). Resource management in cloud and cloud-influenced technologies for internet of things applications. ACM Computing Surveys, 55(12), 1-37.
Jiang, Y., Zhao, S., Li, H., Qin, Y., & Yang, X. (2023). A hybrid spectral clustering simulated annealing algorithm for the street patrol districting problem. Complex & Intelligent Systems, 9(2), 1791–1807.
Jin, J., Zhou, M., Zhang, W., Li, M., Guo, Z., Qin, Z., Jiao, Y., Tang, X., Wang, C., & Wang, J. (2019). Coride: joint order dispatching and fleet management for multi-scale ride-hailing platforms. Proceedings of the 28th ACM international conference on information and knowledge management.
Johannink, T., Bahl, S., Nair, A., Luo, J., Kumar, A., Loskyll, M., Ojea, J. A., Solowjow, E., & Levine, S. (2019). Residual reinforcement learning for robot control. 2019 International Conference on Robotics and Automation (ICRA).
Johnson, D. S., Papadimitriou, C. H., & Yannakakis, M. (1988). How easy is local search? Journal of Computer and System Sciences, 37(1), 79–100.
Juels, A., & Wattenberg, M. (1995). Stochastic hillclimbing as a baseline method for evaluating genetic algorithms. Advances in Neural Information Processing Systems, 8, 430-436.
Kaiser, L., Babaeizadeh, M., Milos, P., Osinski, B., Campbell, R. H., Czechowski, K., Erhan, D., Finn, C., Kozakowski, P., & Levine, S. (2019). Model-based reinforcement learning for atari. arXiv preprint arXiv:1903.00374 .
Ke, J., Xiao, F., Yang, H., & Ye, J. (2020). Learning to delay in ride-sourcing systems: a multi-agent deep reinforcement learning framework. IEEE Transactions on Knowledge and Data Engineering, 34(5), 2280–2292.
Kibria, M. G., Nguyen, K., Villardi, G. P., Zhao, O., Ishizu, K., & Kojima, F. (2018). Big data analytics, machine learning, and artificial intelligence in next-generation wireless networks. IEEE Access, 6, 32328–32338.
Kim, B., Jeong, Y., & Shin, J. G. (2020). Spatial arrangement using deep reinforcement learning to minimise rearrangement in ship block stockyards. International Journal of Production Research, 58(16), 5062–5076.
Kiran, B. R., Sobh, I., Talpaert, V., Mannion, P., Al Sallab, A. A., Yogamani, S., & Pérez, P. (2021). Deep reinforcement learning for autonomous driving: a survey. IEEE transactions on Intelligent Transportation Systems, 23(6), 4909–4926.
Klar, M., Glatt, M., & Aurich, J. C. (2021). An implementation of a reinforcement learning based algorithm for factory layout planning. Manufacturing Letters, 30, 1–4.
Klar, M., Hussong, M., Ruediger-Flore, P., Yi, L., Glatt, M., & Aurich, J. C. (2022). Scalability investigation of Double Deep Q Learning for factory layout planning. Procedia CIRP, 107, 161–166.
Klar, M., Langlotz, P., & Aurich, J. C. (2022). A Framework for Automated Multiobjective Factory Layout Planning using Reinforcement Learning. Procedia CIRP, 112, 555–560.
Klar, M., Mertes, J., Glatt, M., Ravani, B., & Aurich, J. (2023). A Holistic Framework for Factory Planning Using Reinforcement Learning. International Research Training Group Conference on Physical Modeling for Virtual Manufacturing Systems and Processes.
Ko, J., Nazarian, E., Nam, Y., & Guo, Y. (2015). Integrated redistricting, location-allocation and service sharing with intra-district service transfer to reduce demand overload and its disparity. Computers, Environment and Urban Systems, 54, 132–143.
Kober, J., Bagnell, J. A., & Peters, J. (2013). Reinforcement learning in robotics: a survey. The International Journal of Robotics Research, 32(11), 1238–1274.
Koch, L., Brinkmann, T., Wegener, M., Badalian, K., & Andert, J. (2023). Adaptive Traffic Light Control With Deep Reinforcement Learning: An Evaluation of Traffic Flow and Energy Consumption. IEEE transactions on intelligent transportation systems.
Kong, X., Liu, Y., Wang, Y., Tong, D., & Zhang, J. (2017). Investigating public facility characteristics from a spatial interaction perspective: A case study of Beijing hospitals using taxi data. ISPRS International Journal of Geo-Information, 6(2), 38.
Kumar, Y., Koul, A., Singla, R., & Ijaz, M. F. (2022). Artificial intelligence in disease diagnosis: a systematic literature review, synthesizing framework and future research agenda. Journal of Ambient Intelligence and Humanized Computing, 1-28.
Lample, G., & Chaplot, D. S. (2017). Playing FPS games with deep reinforcement learning. Proceedings of the AAAI Conference on Artificial Intelligence.
Levine, S., Kumar, A., Tucker, G., & Fu, J. (2020). Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643.
Li, T., Han, X., Wu, W., & Sun, H. (2023). Robust expansion planning and hardening strategy of meshed multi-energy distribution networks for resilience enhancement. Applied Energy, 341, 121066.
Li, D., Zhang, Z., Alizadeh, B., Zhang, Z., Duffield, N., Meyer, M. A., Thompson, C. M., Gao, H., & Behzadan, A. H. (2024). A reinforcement learning-based routing algorithm for large street networks. International Journal of Geographical Information Science, 38(2), 183–215.
Li, Z., Wang, R., & Wang, Y. (2007). A quadratic programming model for political districting problem. Proceedings of the firsst international symposium on optimization and system biology (OSB). p. 427-435.
Li, M., Qin, Z., Jiao, Y., Yang, Y., Wang, J., Wang, C., Wu, G., & Ye, J. (2019, 2019). Efficient ridesharing order dispatching with mean field multi-agent reinforcement learning.
Li, Y., Zheng, Y., & Yang, Q. (2019). Efficient and effective express via contextual cooperative reinforcement learning. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining.
Li, Y., Zheng, Y., & Yang, Q. (2020). Cooperative multi-agent reinforcement learning in express system. Proceedings of the 29th ACM International Conference on Information & Knowledge Management.
Li, Y. (2017). Deep reinforcement learning: An overview. arXiv preprint arXiv:1701.07274.
Liang, E., Wen, K., Lam, W. H., Sumalee, A., & Zhong, R. (2021). An integrated reinforcement learning and centralized programming approach for online taxi dispatching. IEEE Transactions on Neural Networks and Learning Systems, 33(9), 4742–4756.
Liang, H., Wang, S., Li, H., Zhou, L., Chen, H., Zhang, X., & Chen, X. (2024). Sponet: solve spatial optimization problem using deep reinforcement learning for urban spatial decision analysis. International Journal of Digital Earth, 17(1), 2299211.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971.
Lin, K., Zhao, R., Xu, Z., & Zhou, J. (2018). Efficient large-scale fleet management via multi-agent deep reinforcement learning. Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining.
Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. Machine learning proceedings 1994. Morgan Kaufmann, p. 157-163.
Littman, M. L. (2001). Friend-or-foe Q-learning in general-sum games. ICML, p. 322-328.
Liu, C. H., Chen, Z., Tang, J., Xu, J., & Piao, C. (2018). Energy-efficient UAV control for effective and fair communication coverage: A deep reinforcement learning approach. IEEE Journal on Selected Areas in Communications, 36(9), 2059–2070.
Liu, J., Han, K., Chen, X. M., & Ong, G. P. (2019). Spatial-temporal inference of urban traffic emissions based on taxi trajectories and multi-source urban data. Transportation Research Part C: Emerging Technologies, 106, 145–165.
Liu, J., Sun, J., & Qi, X. (2023). Optimal placement of charging stations in road networks: a reinforcement learning approach with attention mechanism. Applied Sciences, 13(14), 8473.
Liu, B., Fu, Y., Yao, Z., & Xiong, H. (2013). Learning geographical preferences for point-of-interest recommendation. Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining.
Liu, J. N., Hu, Y., You, J. J., & Chan, P. W. (2014). Deep neural network based feature representation for weather forecasting. Proceedings on the International Conference on Artificial Intelligence (ICAI).
Long, E. F., Nohdurft, E., & Spinler, S. (2018). Spatial resource allocation for emerging epidemics: a comparison of greedy, myopic, and dynamic policies. Manufacturing & Service Operations Management, 20(2), 181–198.
Lowe, R., Wu, Y. I., Tamar, A., Harb, J., Pieter Abbeel, O., & Mordatch, I. (2017). Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in Neural Information Processing Systems, 30, 6379-6390.
Meng, S., & Kan, Z. (2021). Deep reinforcement learning-based effective coverage control with connectivity constraints. IEEE Control Systems Letters, 6, 283–288.
Mimis, A., Rovolis, A., & Stamou, M. (2012). An AZP-ACO method for region-building. Artificial Intelligence: Theories and Applications: 7th Hellenic Conference on AI, SETN 2012, Lamia, Greece, May 28-31, 2012. Proceedings, 7.
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., & Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. International conference on machine learning.
Moor, J. H. (1976). An analysis of the Turing test. Philosophical Studies: An International Journal for Philosophy in the Analytic Tradition, 30(4), 249–257.
Mu, S., Huang, X., Wang, M., Zhang, D., Xu, D., & Li, X. (2023). Optimizing pedestrian simulation based on expert trajectory guidance and deep reinforcement learning. GeoInformatica, 1-28.
Murray, A. T., & Church, R. L. (1996). Applying simulated annealing to location-planning models. Journal of Heuristics, 2, 31–53.
Nachum, O., Norouzi, M., Xu, K., & Schuurmans, D. (2017). Trust-pcl: An off-policy trust region method for continuous control. arXiv preprint arXiv:1707.01891.
Naranje, V., Reddy, P. V., & Sharma, B. K. (2019). Optimization of factory layout design using simulation tool. 2019 IEEE 6th International Conference on Industrial Engineering and Applications (ICIEA).
Nassif, A. B., Shahin, I., Attili, I., Azzeh, M., & Shaalan, K. (2019). Speech recognition using deep neural networks: a systematic review. IEEE Access, 7, 19143–19165.
Nasteski, V. (2017). An overview of the supervised machine learning methods. Horizons. b, 4, 51–62.
Nemoto, T., & Hotta, K. (2003). Modelling and solution of the problem of optimal electoral districting. Communications of the OR Society of Japan, 48, 300–306.
Ni, J., Young, T., Pandelea, V., Xue, F., & Cambria, E. (2023). Recent advances in deep learning based dialogue systems: a systematic survey. Artificial Intelligence Review, 56(4), 3055–3155.
Nie, J., & Haykin, S. (1999). A dynamic channel assignment policy through Q-learning. IEEE Transactions on Neural Networks, 10(6), 1443–1455.
O’Leary, D. E. (2013). Artificial intelligence and big data. IEEE Intelligent Systems, 28(2), 96–99.
Openshaw, S. (1977). A geographical solution to scale and aggregation problems in region-building, partitioning and spatial modelling. Transactions of the institute of british geographers, 459-472.
Pan, R., Zhang, T.-Y., He, H., Zheng, Z.-X., Dong, Z.-Y., Zhao, H.-X., Xu, M.-Y., Luo, Z.-N., Hu, C.-Y., & Tang, Y.-L. (2023). Mixed chlorine/chloramines in disinfected water and drinking water distribution systems (DWDSs): A critical review. Water Research, 247, 120736.
Pan, S. J., & Yang, Q. (2009). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.
Panov, A. I., Yakovlev, K. S., & Suvorov, R. (2018). Grid path planning with deep reinforcement learning: preliminary results. Procedia Computer Science, 123, 347–353.
Parragh, S. N., Tricoire, F., & Gutjahr, W. J. (2022). A branch-and-Benders-cut algorithm for a bi-objective stochastic facility location problem. Or Spectrum, 44(2), 419–459.
Perez-Liebana, D., Samothrakis, S., Togelius, J., Schaul, T., & Lucas, S. (2016). General video game ai: Competition, challenges and opportunities. Proceedings of the AAAI Conference on Artificial Intelligence.
Petratos, A., Ting, A., Padmanabhan, S., Zhou, K., Hageman, D., Pisel, J. R., & Pyrcz, M. J. (2021). Optimal placement of public electric vehicle charging stations using deep reinforcement learning. arXiv preprint arXiv:2108.07772.
Pham, H. X., La, H. M., Feil-Seifer, D., & Nefian, A. (2018). Cooperative and distributed reinforcement learning of drones for field coverage. arXiv preprint arXiv:1803.07250.
Philco, L. O., Marrone, L., & Estupiñan, E. (2021). MiA-CODER: a multi-intelligent agent-enabled reinforcement learning for accurate coverage hole detection and recovery in unequal cluster-tree-based QoSensing WSN. Applied Sciences, 11(23), 11134.
Pinar Saygin, A., Cicekli, I., & Akman, V. (2000). Turing test: 50 years later. Minds and Machines, 10(4), 463–518.
Qiu, D., Wang, Y., Hua, W., & Strbac, G. (2023). Reinforcement learning for electric vehicle applications in power systems: a critical review. Renewable and Sustainable Energy Reviews, 173, 113052.
Ramstedt, S., & Pal, C. (2019). Real-time reinforcement learning. Advances in Neural Information Processing Systems, 32.
Renaud, J.-c., & Tham, C.-k. (2006). Coordinated sensing coverage in sensor networks using distributed reinforcement learning. 2006 14th IEEE International Conference on Networks.
Ribino, P., & Bonomolo, M. (2023). A multi-objective reinforcement learning approach for furniture arrangement with optimal IEQ in multi-occupant offices. Journal of Ambient Intelligence and Humanized Computing, 1-22.
Rong, H., Zhou, X., Yang, C., Shafiq, Z., & Liu, A. (2016). The rich and the poor: A Markov decision process approach to optimizing taxi driver revenue efficiency. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management.
Rosaci, D., & Sarnè, G. M. (2014). Multi-agent technology and ontologies to support personalization in B2C E-Commerce. Electronic Commerce Research and Applications, 13(1), 13–23.
Salazar-Aguilar, M. A., Ríos-Mercado, R. Z., & González-Velarde, J. L. (2011). A bi-objective programming model for designing compact and balanced territories in commercial districting. Transportation Research Part C: Emerging Technologies, 19(5), 885–895.
Sallab, A. E., Abdou, M., Perot, E., & Yogamani, S. (2017). Deep reinforcement learning framework for autonomous driving. arXiv preprint arXiv:1704.02532.
Samuel, A. L. (1959). Machine learning. The Technology Review, 62(1), 42–45.
Sanei, O., Nasiri, V., Marjani, M., & Moattar Husseini, S. (2011). A heuristic algorithm for the warehouse space assignment problem considering operational constraints: with application in a case study. 2011 International Conference on Industrial Engineering and Operations Management, Kuala Lumpur, Malaysia.
Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., & Monfardini, G. (2008). The graph neural network model. IEEE Transactions on Neural Networks, 20(1), 61–80.
Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2015). Prioritized experience replay. arXiv preprint arXiv:1511.05952.
Schrijver, A. (1998). Theory of linear and integer programming. John Wiley & Sons.
Schrittwieser, J., Antonoglou, I., Hubert, T., Simonyan, K., Sifre, L., Schmitt, S., Guez, A., Lockhart, E., Hassabis, D., & Graepel, T. (2020). Mastering atari, go, chess and shogi by planning with a learned model. Nature, 588(7839), 604–609.
Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015). Trust region policy optimization. International conference on machine learning.
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
Seah, M. W. M., Tham, C.-K., Srinivasan, V., & Xin, A. (2007). Achieving coverage through distributed reinforcement learning in wireless sensor networks. 2007 3rd international Conference on intelligent sensors, sensor networks and information.
Shah, S., Lowalekar, M., & Varakantham, P. (2020). Neural approximate dynamic programming for on-demand ride-pooling. Proceedings of the AAAI Conference on Artificial Intelligence.
Shahriar, S., Al-Ali, A.-R., Osman, A. H., Dhou, S., & Nijim, M. (2020). Machine learning approaches for EV charging behavior: a review. IEEE Access, 8, 168980–168993.
Shalev-Shwartz, S., Shammah, S., & Shashua, A. (2016). Safe, multi-agent, reinforcement learning for autonomous driving. arXiv preprint arXiv:1610.03295.
Sharma, S., Bhatt, M., & Sharma, P. (2020). Face recognition system using machine learning algorithm. 2020 5th International Conference on Communication and Electronics Systems (ICCES).
Shen, J., Zhang, C. J., Jiang, B., Chen, J., Song, J., Liu, Z., He, Z., Wong, S. Y., Fang, P.-H., & Ming, W.-K. (2019). Artificial intelligence versus clinicians in disease diagnosis: systematic review. JMIR Medical Informatics, 7(3), e10010.
Shi, J., Gao, Y., Wang, W., Yu, N., & Ioannou, P. A. (2019). Operating electric vehicle fleet for ride-hailing services with reinforcement learning. IEEE Transactions on Intelligent Transportation Systems, 21(11), 4822–4834.
Shi, H., Sun, H., Ali, M. A. S., & Mao, H. (2023). Exploring public preferences for ecosystem service improvements regarding nature reserve restoration: a choice experiment study. Ecological Indicators, 156, 111121.
Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M., Sifre, L., Kumaran, D., & Graepel, T. (2018). A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 362(6419), 1140–1144.
Silver, D., Sutton, R. S., & Müller, M. (2007). Reinforcement Learning of Local Shape in the Game of Go. IJCAI.
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. (2014). Deterministic policy gradient algorithms. International conference on machine learning.
Singh, A., Al-Abbasi, A. O., & Aggarwal, V. (2021). A distributed model-free algorithm for multi-hop ride-sharing using deep reinforcement learning. IEEE Transactions on Intelligent Transportation Systems, 23(7), 8595–8605.
Skinner, G., & Walmsley, T. (2019). Artificial intelligence and deep learning in video games a brief review. 2019 ieee 4th international conference on computer and communication systems (icccs).
Skordilis, E., & Moghaddass, R. (2020). A deep reinforcement learning approach for real-time sensor-driven decision making and predictive analytics. Computers & Industrial Engineering, 147, 106600.
Srinivas, A., Sharma, S., & Ravindran, B. (2016). Dynamic frame skip deep q network. arXiv preprint arXiv:1605.05365.
Strehl, A. L., Li, L., Wiewiora, E., Langford, J., & Littman, M. L. (2006). PAC model-free reinforcement learning. Proceedings of the 23rd international conference on Machine learning.
Suchman, E. (1968). Evaluative Research: Principles and Practice in Public Service and Social Action Progr. Russell Sage Foundation.
Sutton, R. S., Precup, D., & Singh, S. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112(1–2), 181–211.
Sutton, R. S. (1992). Introduction: The challenge of reinforcement learning. Reinforcement learning, p. 1-3.
Tamba, T. A. (2021). Optimizing the Area Coverage of Networked UAVs using Multi-Agent Reinforcement Learning. 2021 International Conference on Instrumentation, Control, and Automation (ICA).
Tang, X., Qin, Z., Zhang, F., Wang, Z., Xu, Z., Ma, Y., Zhu, H., & Ye, J. (2019). A deep value-network based approach for multi-driver order dispatching. Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining.
Tong, Y., Shi, D., Xu, Y., Lv, W., Qin, Z., & Tang, X. (2021). Combinatorial optimization meets reinforcement learning: Effective taxi order dispatching at large-scale. IEEE Transactions on Knowledge and Data Engineering.
Tu, W., Ye, H., Mai, K., Zhou, M., Jiang, J., Zhao, T., Yi, S., & Li, Q. (2023). Deep online recommendations for connected E-taxis by coupling trajectory mining and reinforcement learning. International Journal of Geographical Information Science, 38(2): 216-242.
Verma, P., & Sharma, S. (2020). Artificial intelligence based recommendation system. 2020 2nd International Conference on Advances in Computing, Communication Control and Networking (ICACCCN).
Verma, T., Varakantham, P., Kraus, S., & Lau, H. C. (2017). Augmenting decisions of taxi drivers through reinforcement learning for improving revenues. Proceedings of the International Conference on Automated Planning and Scheduling.
von Wahl, L., Tempelmeier, N., Sao, A., & Demidova, E. (2022). Reinforcement learning-based placement of charging stations in urban road networks. Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.
Wang, Q., Zhang, W., Liu, Y., & Liu, Y. (2019). Multi-UAV dynamic wireless networking with deep reinforcement learning. IEEE Communications Letters, 23(12), 2243–2246.
Wang, H., Liang, W., & Yu, L.-F. (2020). Scene mover: Automatic move planning for scene arrangement by deep reinforcement learning. ACM Transactions on Graphics (TOG), 39(6), 1–15.
Wang, W., Zhou, Z., Chen, J., Cheng, W., & Chen, J. (2021). Analysis of location selection of public service facilities based on urban land accessibility. International Journal of Environmental Research and Public Health, 18(2), 516.
Wang, M., Zhang, J., Zhang, P., Cui, L., & Zhang, G. (2022). Independent double DQN-based multi-agent reinforcement learning approach for online two-stage hybrid flow shop scheduling with batch machines. Journal of Manufacturing Systems, 65, 694–708.
Wang, D., Wang, Q., Yin, Y., & Cheng, T. (2023). Optimization of ride-sharing with passenger transfer via deep reinforcement learning. Transportation Research Part E: Logistics and Transportation Review, 172, 103080.
Wang, Y., Sun, Y., Cao, X., Wang, Y., Zhang, W., & Cheng, X. (2023). A review of regional and Global scale Land Use/Land Cover (LULC) mapping products generated from satellite remote sensing. ISPRS Journal of Photogrammetry and Remote Sensing, 206, 311–334.
Wang, Z., Schaul, T., Hessel, M., Hasselt, H., Lanctot, M., & Freitas, N. (2016). Dueling network architectures for deep reinforcement learning. International conference on machine learning.
Wang, Z., Qin, Z., Tang, X., Ye, J., & Zhu, H. (2018). Deep reinforcement learning with knowledge transfer for online rides order dispatching. 2018 IEEE International Conference on Data Mining (ICDM).
Wang, Y., Tong, Y., Long, C., Xu, P., Xu, K., & Lv, W. (2019). Adaptive dynamic bipartite graph matching: A reinforcement learning approach. 2019 IEEE 35th international conference on data engineering (ICDE).
Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279–292.
Wechsler, H. (2014). Neural networks for perception: human and machine perception. Academic Press.
Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8, 229–256.
Wu, T., Zhou, P., Liu, K., Yuan, Y., Wang, X., Huang, H., & Wu, D. O. (2020). Multi-agent deep reinforcement learning for urban traffic light control in vehicular networks. IEEE Transactions on Vehicular Technology, 69(8), 8243–8256.
Xi, Y., Ding, Y., Cheng, Y., Zhao, J., Zhou, M., & Qin, S. (2023). Evaluation of the medical resource allocation: Evidence from China. Healthcare, 11(6), 829.
Xiao, J., Wang, G., Zhang, Y., & Cheng, L. (2020). A distributed multi-agent dynamic area coverage algorithm based on reinforcement learning. IEEE Access, 8, 33511–33521.
Xin, X., Karatzoglou, A., Arapakis, I., & Jose, J. M. (2020). Self-supervised reinforcement learning for recommender systems. Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval.
Xu, D., Huang, X., Li, Z., & Li, X. (2020). Local motion simulation using deep reinforcement learning. Transactions in GIS, 24(3), 756–779.
Xu, D., Huang, X., Mango, J., Li, X., & Li, Z. (2021). Simulating multi-exit evacuation using deep reinforcement learning. Transactions in GIS, 25(3), 1542–1564.
Xu, M., Yue, P., Yu, F., Yang, C., Zhang, M., Li, S., & Li, H. (2023). Multi-agent reinforcement learning to unify order-matching and vehicle-repositioning in ride-hailing services. International Journal of Geographical Information Science, 37(2), 380–402.
Xu, Z., Li, Z., Guan, Q., Zhang, D., Li, Q., Nan, J., Liu, C., Bian, W., & Ye, J. (2018). Large-scale order dispatch in on-demand ride-hailing platforms: A learning and planning approach. Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining.
Yang, H., & Wong, S. C. (1998). A network model of urban taxi services. Transportation Research Part B: Methodological, 32(4), 235–246.
Yang, L., Yu, X., Cao, J., Liu, X., & Zhou, P. (2021). Exploring deep reinforcement learning for task dispatching in autonomous on-demand services. ACM Transactions on Knowledge Discovery from Data (TKDD), 15(3), 1–23.
Yang, Y., Luo, R., Li, M., Zhou, M., Zhang, W., & Wang, J. (2018). Mean field multi-agent reinforcement learning. International conference on machine learning.
Yuan, Q., Cong, G., Ma, Z., Sun, A., & Thalmann, N. M.-. (2013). Time-aware point-of-interest recommendation. Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval.
Zhang, W., Wang, Q., Li, J., & Xu, C. (2020). Dynamic fleet management with rewriting deep reinforcement learning. IEEE Access, 8, 143333–143341.
Zhang, Q., Lu, J., & Jin, Y. (2021). Artificial intelligence in recommender systems. Complex & Intelligent Systems, 7, 439–457.
Zhang, L., Hu, T., Min, Y., Wu, G., Zhang, J., Feng, P., Gong, P., & Ye, J. (2017). A taxi order dispatch model based on combinatorial optimization. Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining.
Zhang, Z., Luo, X., Liu, T., Xie, S., Wang, J., Wang, W., Li, Y., & Peng, Y. (2019). Proximal policy optimization with mixed distributed training. 2019 IEEE 31st international conference on tools with artificial intelligence (ICTAI).
Zhao, D., Wang, H., Shao, K., & Zhu, Y. (2016). Deep reinforcement learning with experience replay based on SARSA. 2016 IEEE symposium series on computational intelligence (SSCI).
Zhao, Z., Lee, C. K., Ren, J., & Tsang, Y. P. (2023). Optimal EV Fast Charging Station Deployment Based on a Reinforcement Learning Framework. IEEE transactions on intelligent transportation systems.
Zhong, Y., Wang, S., Liang, H., Wang, Z., Zhang, X., Chen, X., & Su, C. (2024). ReCovNet: Reinforcement learning with covering information for solving maximal coverage billboards location problem. International Journal of Applied Earth Observation and Geoinformation, 128, 103710.
Zhou, X., Wu, L., Zhang, Y., Chen, Z.-S., & Jiang, S. (2023). A robust deep reinforcement learning approach to driverless taxi dispatching under uncertain demand. Information Sciences, 646, 119401.
Zhou, M., Jin, J., Zhang, W., Qin, Z., Jiao, Y., Wang, C., Wu, G., Yu, Y., & Ye, J. (2019). Multi-agent reinforcement learning for order-dispatching via order-vehicle distribution matching. Proceedings of the 28th ACM International Conference on Information and Knowledge Management.
Zhu, H. (2020). Big data and artificial intelligence modeling for drug discovery. Annual Review of Pharmacology and Toxicology, 60, 573–589.
Zhu, Y., & Zhao, D. (2020). Online minimax Q network learning for two-player zero-sum Markov games. IEEE Transactions on Neural Networks and Learning Systems, 33(3), 1228–1241.
Zou, G., Tang, J., Yilmaz, L., & Kong, X. (2022). Online food ordering delivery strategies based on deep reinforcement learning. Applied Intelligence, 52, 6853–6865.
Acknowledgements
Not applicable.
Funding
This work was supported by the project funded by International Research Center of Big Data for Sustainable Development Goals [Grant Number CBAS2022GSP07], Chongqing Natural Science Foundation [Grant Number CSTB2022NSCQMSX 2069] and Ministry of Education of China [Grant Number 19JZD023].
Author information
Authors and Affiliations
Contributions
Di Zhang, Conceptualization; Investigation; Methodology; Writing original draft; Data curation & Interview. Moyang Wang, Conceptualization; Writing – Review & Editing. Joseph Mango, Conceptualization; Writing – Review & Editing. Xiang Li, Resources; Writing – Review & Editing. Xianrui Xu, Resources; Writing – Review & Editing.
Corresponding author
Ethics declarations
Competing interests
Not applicable.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zhang, D., Wang, M., Mango, J. et al. A survey on applications of reinforcement learning in spatial resource allocation. Comput.Urban Sci. 4, 14 (2024). https://doi.org/10.1007/s43762-024-00127-z
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s43762-024-00127-z