Abstract
Autonomous vehicles (AVs) have multiple tasks with different priorities and safety levels where classic supervised learning techniques are no longer applicable. Thus, reinforcement learning (RL) algorithms become increasingly appropriate for this domain as the RL algorithms can act on complex problems and adapt their responses in the face of unforeseen situations and environments. The RL agent aims to perform the action that guarantees the optimal reward with the best score. The problem with this approach is if the agent finds a possible optimal action with a reasonable premium and gets stuck in this mediocre strategy, which at the same time is neither the best nor the worst solution. Therefore, the agent avoids performing a more extensive exploration to find new paths and learn alternatives to generate a higher reward. To alleviate this problem, we research the behavior of two types of noise in AVs training. We analyze the results and point out the noise method that most stimulates exploration. A vast exploration of the environment is highly relevant to AVs because they know more about the environment and learn alternative ways of acting in the face of uncertainties. With that, AVs can expect more reliable actions in front of sudden changes in the environment. According to our experiments’ results in a simulator, we can see that noise allows the autonomous vehicle to improve its exploration and increase the reward.
Similar content being viewed by others
References
Liu X, Xu H, Liao W, Yu W (2019) Institute of Electrical and Electronics Engineers Inc., pp 318–327. https://doi.org/10.1109/ICII.2019.00063
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) arXiv:1509.02971
Kuderer M, Gulati S, Burgard W (2015) Institute of Electrical and Electronics Engineers Inc., pp 2641–2646. https://doi.org/10.1109/ICRA.2015.7139555
Lindemann P, Lee T-Y, Rigoll G (2018) . Multimodal Technol Interact 2:4. https://doi.org/10.3390/mti2040071, https://www.mdpi.com/2414-4088/2/4/71
Ziebart BD (2010) Modeling purposeful adaptive behavior with the principle of maximum causal entropy
Zhu H, Yuen K-V, Mihaylova L, Leung H (2017) . IEEE Trans Intell Transp Syst 18 (10):2584–2601. https://doi.org/10.1109/TITS.2017.2658662
Ahmed Z, Roux NL, Norouzi M, Schuurmans D (2018) arXiv:1811.11214
Leite A, Candadai M, Izquierdo EJ (2020) . Artif Life Conf Proc 32:441–449. https://doi.org/10.1162/isal_a_00338
Nair A, McGrew B, Andrychowicz M, Zaremba W, Abbeel P (2018) Institute of Electrical and Electronics Engineers Inc., pp 6292–6299. https://doi.org/10.1109/ICRA.2018.8463162
Stekolshchik R (2020) Noise, overestimation and exploration in deep reinforcement learning
Nauta J, Khaluf Y, Simoens P Using the ornstein-uhlenbeck process for random exploration
Bougie1 N, Ichise R (2021) Fast and slow curiosity for high-level exploration in reinforcement learning. https://doi.org/10.1007/s10489-020-01849-3
Pong VH, Dalal M, Lin S, Nair A, Bahl S, Levine S (2019) arXiv:1903.03698
Hong Z-W, Shann T-Y, Su S-Y, Chang Y-H, Lee C-Y (2018) arXiv:1802.04564
Iyengar K, Dwyer G, Stoyanov D (2020) . Int J CARS 15:1157–1165. https://doi.org/10.1007/s11548-020-02194-z
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym
Hafner D, Lillicrap T, Norouzi M, Ba J (2021) Mastering atari with discrete world models
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms
Stanton C, Clune J (2018) Deep curiosity search: Intra-life exploration can improve performance on challenging deep reinforcement learning problems
Ferret J, Pietquin O, Geist M (2021) Self-imitation advantage learning. www.ifaamas.org
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning
Bellemare MG, Srinivasan S, Ostrovski G, Schaul T, Saxton D, Munos R (2016) Unifying count-based exploration and intrinsic motivation
Plappert M, Houthooft R, Dhariwal P, Sidor S, Chen RY, Chen X, Asfour T, Abbeel P, Andrychowicz M (2017) arXiv:1706.01905
Ghafoorianfar N, Roopaei M (2020) 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), pp 0444–0448
Rasmussen CE (2004). In: Bousquet O, von Luxburg U, Rätsch G (eds) Gaussian processes in machine learning. Springer, Berlin
Uhlenbeck GE, Ornstein LS (1930) . Phys Rev 36:823–841. https://doi.org/10.1103/PhysRev.36.823
Bartoszek K, Glémin S, Kaj I, Lascoux M (2017) . J Theor Biol 429:35–45. https://doi.org/10.1016/j.jtbi.2017.06.011
Lim W, Lee S, Sunwoo M, Jo K (2018) . IEEE Trans Intell Transp Syst 19(2):613–626. https://doi.org/10.1109/TITS.2017.2756099
Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V (2017) In: Proceedings of the 1st annual conference on robot learning, pp 1–16
Niu F, Recht B, Re C, Wright SJ (2011) Hogwild!: A lock-free approach to parallelizing stochastic gradient descent
Lazaridis A, Fachantidis A, Vlahavas I (2020) . J Artif Intell Res 69:1421–1471
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Peixoto, M.J.P., Azim, A. Improving environmental awareness for autonomous vehicles. Appl Intell 53, 1842–1854 (2023). https://doi.org/10.1007/s10489-022-03468-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03468-6