Smart control of traffic lights based on traffic density in the multi-intersection network by using Q learning

Mortazavi Azad, Seyedeh M.; Ramazani, A.

doi:10.1007/s44163-023-00087-z

Smart control of traffic lights based on traffic density in the multi-intersection network by using Q learning

Research
Open access
Published: 28 November 2023

Volume 3, article number 39, (2023)
Cite this article

Download PDF

You have full access to this open access article

Discover Artificial Intelligence Aims and scope Submit manuscript

Smart control of traffic lights based on traffic density in the multi-intersection network by using Q learning

Download PDF

Seyedeh M. Mortazavi Azad¹ &
A. Ramazani¹

2000 Accesses
1 Citation
Explore all metrics

Abstract

In urban areas, utilizing traffic lights to prioritize vehicles at the intersection is a solution to control traffic. Among the smart traffic light methods, the methods based on machine learning are particularly important due to their simplicity and performance. In this paper, Q-learning with deep neural network are combined and used in two different intersection models. The first one is an individual intersection, and the second one is two intersections that are connected and shared their actions. By using this method, the traffic light can make an intelligent decision at the intersection to reduce vehicle consumption time by managing the allocation of phases. The proposed smart traffic light is studied and simulated via SUMO. The results illustrated that compared to fixed-time traffic lights, the average queue time of each vehicle in different traffic scenarios has been reduced by 34% in the individual intersection. In the case of two intersections, awareness and communication between agents led to a 24% reduction in the queue time of all cars in the heavy traffic scenario.

Traffic Light Control Using RFID and Deep Reinforcement Learning

Deep Reinforcement Q-Learning for Intelligent Traffic Signal Control with Partial Detection

Article 02 February 2023

Traffic Lights Control Using Reinforcement Learning: A Comparative Study

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In many intersections, traffic lights are used to control traffic. The pattern and time of them can be static or dynamic. In the static model, the time and order of the three phases -green, yellow, and red- are predetermined, while in the dynamic model based on environmental assessment and traffic intensity, the phases are allocated so that the traffic congestion will be reduced. In classical approaches, a slight change in environmental conditions needs to make a fundamental difference in the control model structure. In other words, it is necessary to design fuzzy control for every intersection that is supposed to be used, and expandability is impossible. Since traffic is a real-time problem, the best method to design a controller must be associated with real-time information to make a decision based on the number of vehicles which are in the environment. Furthermore, counting the cars in the intersection can be done by sensors. The point is their maintenance is costly. Among methods based on dynamic models which use artificial intelligence to decide at intersections are simpler and more efficient.

In this research, motivated by the above discussion, we want to design and simulate an efficient real-time method, so deep Q-learning is used. This combines deep neural networks and Q-Learning. Q-Learning is a type of model-free reinforcement learning. It aims to assign a value, known as the Q-value, to an action. Based on the current state of the environment, the traffic light -as an agent- decides which lane must get the green phase. In other words, which action set should be chosen considering the Q-value. Devoting these values is done by the deep neural network. This estimation leads to better decision-making in complex environments such as traffic conditions. Then the reward is received to analyze whether that decision made by the agent was useful and led to better control at the intersection. The performance of traffic lights should be in a way that the queue time (time that a car is in the queue) is reduced compared to a fixed-time traffic light. Low, medium, and heavy traffic scenarios are simulated in Simulation of Urban Mobility (SUMO).

The main contributions of this paper are as follows:

Firstly, studying an individual intersection. A four-way intersection is managed by an agent. The results of this method are compared with a static traffic light.
Secondly, the consideration of two connected intersections. Each intersection has its agent. The situations where agents are aware and unaware of each other's actions are compared.

The main structure of this work is shown in Fig. 1.

The following sections of this paper are as follows: The next section presents an overview of the literature. Section 3 gives a review of reinforcement learning. In Sect. 4 a single four-way intersection and its results are discussed. In Sect. 5 the studying of two joined intersections and the results are provided. Finally, the conclusions are drawn in Sect. 6.

2 Related work

Researches have focused on controlling traffic congestion with different approaches. The first problem is the information gathering about intersections traffic. Classical methods utilize sensors on the roads to count the number of vehicles. The fundamental drawbacks of it are the cost of implementing sensors and maintenance. Acquiring information by cameras is the other procedure that analyzes pictures to categorize whether the traffic is heavy. Gradient magnitude and frame subtraction method [1] color-to-grayscale algorithm and cascade classifier [2] are some examples of image processing methods for car detection. As the traffic issue is real-time, there must be a solution to observe the environment and make a decision based on real-time information. Using fuzzy logic in intersections traffic control was discussed and it led to a 10–21% decrease in delay time [3]. The main disadvantage of this method can be emphasized as depending on expert knowledge. In addition, it is specific for each intersection. The general structure cannot be implemented in other situations. In better words, generalizability is limited. As a result, artificial intelligence methods are getting replaced.

In traffic forecasting, machine learning concepts can be used. In [4], deep Autoencoder, Deep Belief Network, Random Forest, and Long Short Term Memory (LSTM) were discussed in terms of accuracy, precision, recall, and error in traffic prediction. Among different techniques, LSTM had the best accuracy of 95 percent.

Traffic control studies based on reinforcement learning have drawn attention in recent years. The main difference between these kinds of research was in the way of defining the state and reward. Among 160 articles reviewed by Nain et al. [5] to define "state" in the category of reinforcement learning, 38% of articles queue size, 11% phase state, 10% number of vehicles, 6% vehicle position, and 6% speed have chosen. As a reward, 30% of the articles mentioned queue size, 13% delay, 9% waiting time, 6% number of vehicles, and 4% throughput as a reward.

In order to optimize the traffic signal, a novel methodology was invented by Mao and Hai. Their paper aimed to reduce the effects of an accident on the total travel time (TTT), so genetic algorithm and machine learning were combined. The results showed a decrease in TTT by approximately 50% in comparison with the original genetic algorithm [6].

A study has been done by Ge and et al. [7]. In this research, multi-agent and deep Q-network have been proposed to control a group of intersections. Each agent searches for the optimal strategy to control an intersection with a deep Q- network. Reply experience is used to increase the accuracy of the Q-values. This technique is related to DQN, which aimed to reduce the time correlation in the training dataset [8, 10].

The impacts of different reward definitions and state representation on average waiting time per car were discussed by Razack et al. [8]. Using waiting time as a reward function when the number of states was 80 caused a drop in waiting time per car by 26 percent.

El-Tantawy and Abdullahi [9] used multi-agent reinforcement learning to control traffic lights in five intersections. Agents were employed in two ways. In the first design, agents were independent. In the second scheme, each agent using the multi-agent reinforcement learning algorithm coordinate the control measures with neighboring agents to minimize the total vehicle delay in the traffic network. The performance of the independent state is better than the integrated state ratio of 2–7%.

Among the machine learning-based methods in traffic control, the linear regression model is common. This method is parametric and the number of parameters is low, however, it is difficult to implement. On the other hand, their accuracy is low, and they cannot work well in traffic controllers that are nonlinear and they will not be robust. Neural networks are considered non-parametric models. In comparison with the recursive neural network (RNN) method and Convolutional Neural Network (CNN), the Feed Forward Neural Network (FFNN) method is simpler, but it is less robust [10, 12].

Reviewing the state of the art reveals that there should be an efficient reinforcement learning approach to control traffic. The point is how to tackle setbacks and make the simulation closer to the real conditions. Then this article aims to provide an appropriate way for a single and a group of intersections.

3 Reinforcement learning

In reinforcement learning, an agent interacts with the environment to learn how to act in an environment, without prior knowledge, and through learning to maximize a defined reward function (or minimize a penalty). The most well-known reinforcement learning technique is Q-learning (abbreviated as QL), which uses a trial-and-error approach to explore a complex and random environment and selects the best action based on experience. Q-learning is model-free and allocates a value called Q-value to the action taken from the environment. In fact, it is determined what the best action for each state is. The definition of state representation, reward function, and action set have a crucial role to play in QL. If what an agent does leads to achieving the system goal, it will get a reward, otherwise there will be a penalty. The state represents the current environment of an agent. One action at s_t is applied and goes to the next state (s_t+1) to obtain the maximum value in the next state. Q-function receives state and action. Then it calculates the reward. Q-value can be expressed as (1).

$$Q\left({s}_{t} , {a}_{t}\right)\leftarrow Q\left({s}_{t} , {a}_{t} \right)+\upeta . \left( {\mathrm{r}}_{\mathrm{t}+1}+\upgamma .\mathrm{max}Q \left( {s}_{t+1} , {a}_{t+1}\right)-Q\left({s}_{t} , {a}_{t}\right)\right)$$

(1)

where Q(s_t, a_t) is the value related to an action based on a state, and $\eta$ is the learning rate, which can be between 0 and 1. How much α gets closer to 0 means Q-value gets updated less, so nothing is learned. On the other hand, high value means learning happens fast. Max refers to taking the most valuable, among actions a_t in state s_t+1. γ is a discount factor that can have a value between 0 and 1, and determines the importance of previous rewards in the future in comparison with immediate reward [10, 12].

In reinforcement learning, the policy is important during training. In this research, the method ɛ-greedy is used.

$${\varepsilon }_{h}=1-\frac{h}{H}$$

(2)

where ε is the probability of an exploratory action for the current episode of h. Consequently, the probability of the exploitation action is 1—ε. As the number of episodes increases, ε_h gradually approaches zero and exploration will be chosen.

3.1 DQN

Since the state representation is large, it is impossible to calculate the Q-values from a Q-table. Therefore, DQN is proposed to combine Q-learning with a deep neural network [11, 13]. In this method, the purpose is to approximate the Q-learning with deep neural networks. Inputs of the deep neural network are the states representative and outputs are Q-values, Fig. 2. To select the number of nourons, there are three methods: 1. The number of neurons can be between the size of the input layer and the size of the output layer. 2. The number of neurons can be the sum of the input layer size and the output layer size. 3. The number of neurons can be less than twice the size of the input layer [11, 13]. Two different models of neural networks with four hidden layers for one intersection are considered in this work. In the first model, each hidden layer consists of 320 neurons, and the second case consists of 480 neurons. In two connected intersections, the first model of ANN is run as this model works better in heavy traffic, and reduces queue time more (Fig. 3).

It is worth mentioning that the replay experience during the training phase is used to improve the performance of the agent. In this method, the information that the agent gathers during the simulation is not sent immediately. The information is submitted in the form of random samples, called a batch. The batch is derived directly from the memory, which stores each sample collected during the training phase.

A sample m is defined as follows.

$$m = \left\{ {s_{t} ,a_{t} , \, r_{t + 1} , \, s_{t + 1} } \right\}$$

(3)

Figure 4 demonstrates that the agent performs action in state s_t, and it observes a new state s_t+1 while receiving the reward r_t+1. The sample m = (s_t, a_t, s_t+1, r_t+1) is stored in memory. Then a batch is randomly selected from the memory buffer. The Q-Network is trained by selected subset of samples. Subsequently, Q-values are predicted by the neural network, which determine the next action a_t+1.

4 Single intersection

In this part of the study, the improvement of traffic flow in an intersection is investigated. An agent is provided to maximize a particular criterion of traffic efficiency. Then the agent receives the states of the environment to choose the phase of the traffic light. This choice is through a fixed set of actions to optimize the reward. After some simulation steps, the agent starts its next operation by collecting information from the current state of the environment. Also, the agent calculates the previously selected action reward by using some criteria of the current traffic situation. A random number of samples is taken from the memory to get information for learning. The sample contains information about the last stages of simulation in memory. Then training is done. Now, the agent is ready to select a new action based on the current state of the environment to continue the simulation until the next episode comes.

The environment is a four-way intersection. Each way is 750 m long and has 4 lanes. The 3 right-most are used to go straight or turn right, while the left-most one is barely used to make a left, and it has its traffic light. Eight traffic lights are considered in this intersection. The pattern of these traffic lights is following as: red, yellow, and green.

4.1 Traffic generation

The Weibull distribution is used to generate three scenarios. There are 4000, 2000 and 600 cars in heavy, medium, and low scenario, respectively. Simulation is run in Simulation of Urban Mobility (SUMO). It is not only open source, but also it can be easily used for simulating and analyzing traffic jams. In SUMO, 100 episodes are supposed to be executed. Each episode has 5400 steps. As a result, all 100 episodes are equal to 150 h in the real world.

4.2 State representation

There is a frequently approach for state representation. It is a feature-based value vector. In this method, average or total value of specific information is used [14]. To obtain the states, each intersection lane is discretized. The length of each cell depends on the closeness and distance from the intersection. The cells close to the crossing are smaller. This means that the closer information is more important than the cells farther away. The left line is considered separately. There are 80 cells. Each cell is considered as a C. C_k is the k-th cell (k is the position). Each C_k contains four data sets, including the total number of vehicles, the average speed, the cumulative queue time of vehicles in the C_k cell, and the number of queued. As there are 80 cells in the environment, and each cell has four sets of information, the total state number is 320, which is the input of the neural networks.

$$\vec{S} = \left( {\vec{T},\overrightarrow {AS } , \overrightarrow {Aqt} , \overrightarrow {NQ} } \right)$$

(4)

where $\overrightarrow{T}$ is the total number of cars in C_k, $\overrightarrow{AS}$ is the average speed of cars in the C_k, $\overrightarrow{Aqt}$ is the accumulated waiting time of cars in the cell C_k and $\overrightarrow{NQ}$ is the number of queued cars.

4.3 Action set

Taking action means activation of a green phase for a set of lines for a constant duration. The set of actions is defined in (4). Each of these activates the green phase for a specific lane. It is obvious that each way that is in front of the other gets the green phase simultaneously. It means when the north way gets the green phase, at the same time the south way of the intersection gets the green phase, too. As the left lane has its traffic light and this lane is only used to take a turn, it needs a specific action. Table 1 describes the action set, and provides information about which traffic light should be given the green phase. NS is selected when the right way of the northern or southern arm is crowded. NSL is chosen when the left lane of the north or south has the most cars. EW and EWL are also for the eastern and western arms.

$$A = \left\{ {NS, \, NSL, \, EW, \, EWL} \right\}$$

(5)

Table 1 Action set decryption

Full size table

4.4 Reward function

In literature, it can be seen different reward functions. The total time that all cars have been waiting in queue can be expressed as follows [14, 15]

$${tqt}_{t}=\sum_{veh=1}^{n}{qt}_{(veh,t)}$$

(6)

where tqt_t is the total queue time, and qt_(veh,t) is the queue time of every car at step t. Also, n is the number of vehicles in the environment. As a result, the reward function could be written as (5) [14, 15].

$${r}_{t}=t{qt}_{t-1}-t{qt}_{t}$$

(7)

The difference between the total queue time at t-1 and t were considered as the reward at step t (r_t). The main drawback of this reward definition is that in traffic jams there will be long lines crowded with lots of cars. As soon as the lane which has the most cars receives the green phase, many cars start moving. The reward of it, it will be gotten not only at the next agent step, but also in the next ones. Therefore, the agent will be confused about the reward. Finally, it leads to choose an inappropriate action, so accumulated total queue time is suggested as (6).

$$atq{t}_{t}=\sum_{veh=1}^{n}aq{t}_{(veh, t)}$$

(8)

where atqt_t is the accumulation of the total queue time of all vehicles at the time step of t and aqt _{(veh, t)} is the time when a vehicle has a velocity less than 0.1 m/s at the time step t. This time has been counted from the time that the car is generated in the environment. Also, n represents the total number of vehicles in the environment. By this, when a vehicle is moving but cannot pass the intersection, atqt_t will not be reset (unlike tqt_t). This avoids the misleading reward associated with the function mentioned in (6) when creating a long queue. The reward function considered in this article is as follows.

$${r}_{t}=at{qt}_{t-1}-at{qt}_{t}$$

(9)

where r_t represents the reward at step t. atqt_t and atqt_{t -1} show the cumulative total queue time of all vehicles at the intersection, at steps t and t-1, respectively.

It is worth noting that in this work, the negative reward is going to be minimized. If there were a positive reward, the reward would be maximized.

4.5 Result and discussion of a single intersection

The results of the comparison with a static traffic light (STL) are presented in Table 2 and Figs. 5, 6 and 7. The terms “first” and “second” in Figs. 5, 6 and 7 refers to the first model of neural network, which was introduced before. In Fig. 2 the result of the heavy scenario is provided. It can be said that in the second model of neural network for all cars waiting time was almost 3000 s less than when the first model was used. However, for the medium and low scenarios, the first model of neural network controlled more effectively since the total queue time was roughly 6000 s less than the first model for the medium scenario, Fig. 6.

Table 2 The comparison between a smart traffic control and a STL

Full size table

It is evident from the information given in Table 2 that compared to STL, the proposed method has led to a decrease in the total queue time and the queue time per vehicle. The data indicates that the Medium scenario has experienced the most significant change by approximately 62 percent decline in the total queue time. Interestingly enough, the time that cars spent in the Heavy scenario was 22 percent less than the same scenario in STL, while for the Low scenario, this decrease was 13 percent. It shows this method performed better in the Heavy scenario than the Low scenario, which is not a concern because due to the nature of this scenario, the STL can be practical when there are few cars. It was not expected a significant change in this scenario.

Moreover, to evaluate the total queue time for all cars at each time interval, it is vitally important to acknowledge that some vehicles may experience extended waiting periods, while this is not shown in the overall time measurement. To address it, the result of the negative reward function is demonstrated in Fig. 8. As a result, the individual experience of each vehicle is considered. Figure 8 reveals that in the medium and heavy traffic scenarios, STL has the most negative reward, which means the agent in smart traffic learns and controls better. However, in the low scenario, it was mentioned in terms of the queue time, STL has the least negative reward. This suggests STL is efficient when there are few vehicles in the environment.

5 Two connected intersections

In this section, two connected intersections are discussed as shown in Fig. 9. The agents are considered as: a1 controls the left intersection, and a2 the right one. The traffic scenarios and the length of lanes are as same as the single intersection, except that in the medium scenario, there is a condition. In this scenario, 90 percent of the cars come from the eastern or western lane.

In the first case, two independent DQNs have been introduced to see how two agents interact in the same environment. Each intersection corresponds to an agent. Each agent is unaware of what the other agent is doing; and the reward is determined locally and the state of each agent is not aware of the action of the reciprocal agent. In other words, each agent is unaware of what the other agent is doing and how the other agent's actions may affect him. With this approximation, it is ignored that the interaction between factors affects each other's decision making. In the second case, agents can share information. In this section, the state is formed when the agent uses the information of another agent to prepare the state.

5.1 New state representative

For the case that the two agents are interacting together, a different state representation of what was introduced in the single intersection is provided. The idea is to connect the agents by sharing action knowledge. A function called f is added to connect I = 0; 1; 2; 3 to four possible action sets.

$$f: \, \cdot A \to I$$

(10)

$$f \, \left( {NS} \right) = 0, \, f \, \left( {NSL} \right) = 1, \, f \, \left( {EW} \right) = 2, \, f \, \left( {EWL} \right) = 3$$

(11)

Consequently, new state representation (NewS) becomes:

$$\overrightarrow{NewS}=(\overrightarrow{T},\overrightarrow{AS }, \overrightarrow{Aqt} , \overrightarrow{NQ}, f\left(A\right))$$

(12)

It is worth mentioning that this change is just for related agent’ tests. For the first case, it is clear that the state representation is the same as the single intersection.

5.2 Result and discussion of two connected intersections

Two different tests were studied. The first test focused on having 2 agents and controlling a group of intersections by reinforcement learning. The second test aimed to study how the traffic at one intersection can have an impact on the other agent’s decision. As it is presented in Table 3, the negative rewards of two related agents in low and EW scenarios are more than the first test which does not share action information. However, when traffic is heavy, the negative reward of the first test is more. It illustrates that the agents decide more efficiently when the state representation has more information related to the action set. In Table 3, RSA represents Reward of Separate Agents, while RCA shows Reward of Connected Agents. Table 4 demonstrates the comparison between two methods. It presents how the average cumulative queue time and average queue time variation per car change when the agents share actions. In the heavy traffic scenario, in comparison with the separated agents scenario, when each agent is aware of the other one’s action, there is a decrease in the average queue time per car of 53%. In low and EW scenarios, considering an agent for each intersection consumes less time than the method of sharing knowledge. This could be cajoled because the main concern is the traffic jam that has been perfectly controlled by the proposed approach. For other scenarios, time-wasting is not significant. More importantly, in the low traffic scenario, even a normal traffic light can manage intersections well. Therefore, it can be said that the idea of sharing action knowledge worked out. In Abdullahi and El-Tantawy’s article [9, 10], the performance of the proposed model is better in the case of independent agents between 2 and 7 percent, while in this paper, at least the proposed method of dependent agents has improved the heavy scenario by 53%.

Table 3 Comparison between rewards of two tests

Full size table

Table 4 The percentage of changing in action knowledge sharing over the case of two separate agents

Full size table

6 Conclusion

In this paper, an RL approach was presented to solve the traffic congestion problem. This work has provided a smart traffic light for a single intersection and two connected intersections to reduce the time that vehicles have to wait to pass the intersection. In the first case, the proposed method reduced the average queue time per vehicle by 34 percent in comparison with a static traffic light. In the second test, the agents of two intersections have been considered independent and related. It has been shown when agents are separated, the total queue time is less than the connected case. Utilizing the information that is drawn from discrete space helped the agents to make decisions appropriately. It is worth mentioning that in complicated issues with a group of intersections, sharing action knowledge might not work out, and there must be a trade-off between model complexity and time. Also, future work could be aimed to focus on a multi-agent approach in a heterogeneous environment. In a heterogeneous environment, the number of the lanes are not the same. In order to make the environment more realistic, emergency vehicles could be considered, and allocate green phase out of order.

Data availability

Data sharing is available per request.

References

Choukekar GR, Bhosale MA. Density based smart traffic light control system and emergency vehicle detection based on image processing. Int Res J Eng Technol (IRJET). 2018;5(4):2441–6.
Google Scholar
Faisal F, Das SK, Siddique AH, Hasan M, Sabrin S, Hossain CA, Tong Z. Automated traffic detection system based on image processing. J Comput Sci Technol Stud. 2020;2(1):18–25.
Google Scholar
Pappis C, Mamdani E. A fuzzy logic controller for a traffic junction IEEE Trans. Syst Man Cybern. 1977. https://doi.org/10.1109/TSMC.1977.4309605.
Article MATH Google Scholar
Reshma Ramchandra N, Rajabhushanam C. Machine learning algorithms performance evaluation in traffic flow prediction. Mater Today Proc. 2022. https://doi.org/10.1016/j.matpr.2021.07.087.
Article Google Scholar
Noaeen M, et al. Reinforcement learning in urban network traffic signal control: a systematic literature review. Expert Syst Appl. 2022;199:116830.
Article Google Scholar
Mao T, Mihăită AS, Chen F, Vu HL. Boosted genetic algorithm using machine learning for traffic control optimization. IEEE Trans Intell Transport Syst. 2021;23(7):7112–41.
Article Google Scholar
Ge H, Song Y, Wu C, Ren J, Tan G. Cooperative deep Q-learning with Q-value transfer for multi-intersection signal control. IEEE Access. 2019;7(August):40797–809.
Article Google Scholar
AJ Razack, V Ajith, R Gupta, A Deep reinforcement learning approach to traffic signal control, in 2021 IEEE Conference on technologies for sustainability. SusTech. 2021:1–7.
S El-Tantawy, B Abdulhai. Multi-agent reinforcement learning for integrated network of adaptive traffic signal controllers (MARLIN-ATSC), IEEE Conf Intell Trans Syst Proc. 2012;1140–50.
W. Fedus et al., Revisiting fundamentals of experience replay, 37th Int. Conf. Mach. Learn. ICML 2020, vol. PartF16814, pp. 3042–52 (2020).
J. Heaton, Introduction to Neural Networks with Java, 2nd Edition (2008)
A. Boukerche and J. Wang, Machine Learning-based traffic prediction models for Intelligent Transportation Systems. Comput Networks, vol. 181, no. May, p. 107530 (2020).
C. Vandoros, M. Geitona, V. Kontozamanis, and A. Karokis, Php21 Pharmaceutical Policy in Greece: Recent Developments and the Role of Pharmacoeconomics, Value in Health, vol. 8, no. 6. p. A187 (2005).
A. Haydari and Y. Yilmaz, “Deep Reinforcement Learning for Intelligent Transportation Systems: A Survey, IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 1. pp. 11–32 (2020).
Genders W, Razavi S. Using a deep reinforcement learning agent for traffic signal control. ArXiv. 2016. https://doi.org/10.48550/arXiv.1611.01142.

Download references

Author information

Authors and Affiliations

Electrical Engineering Department, Bu-Ali Sina University, Hamadan, Iran
Seyedeh M. Mortazavi Azad & A. Ramazani

Authors

Seyedeh M. Mortazavi Azad
View author publications
You can also search for this author in PubMed Google Scholar
A. Ramazani
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Seyedeh M.Mortazavi Azad developed the model and performed the simulation and prepared the primary main text. A.Ramazani reviewed and revised the manuscript.

Corresponding author

Correspondence to A. Ramazani.

Ethics declarations

Competing interests

The authors declared that they have no conflicts of interest to this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Mortazavi Azad, S.M., Ramazani, A. Smart control of traffic lights based on traffic density in the multi-intersection network by using Q learning. Discov Artif Intell 3, 39 (2023). https://doi.org/10.1007/s44163-023-00087-z

Download citation

Received: 02 August 2023
Accepted: 06 November 2023
Published: 28 November 2023
DOI: https://doi.org/10.1007/s44163-023-00087-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Smart control of traffic lights based on traffic density in the multi-intersection network by using Q learning

Abstract

Similar content being viewed by others

Traffic Light Control Using RFID and Deep Reinforcement Learning

Deep Reinforcement Q-Learning for Intelligent Traffic Signal Control with Partial Detection

Traffic Lights Control Using Reinforcement Learning: A Comparative Study

1 Introduction

2 Related work