Robot-assisted flexible needle insertion using universal distributional deep reinforcement learning
- 81 Downloads
Flexible needle insertion is an important minimally invasive surgery approach for biopsy and radio-frequency ablation. This approach can minimize intraoperative trauma and improve postoperative recovery. We propose a new path planning framework using multi-goal deep reinforcement learning to overcome the difficulties in uncertain needle–tissue interactions and enhance the robustness of robot-assisted insertion process.
This framework utilizes a new algorithm called universal distributional Q-learning (UDQL) to learn a stable steering policy and perform risk management by visualizing the learned Q-value distribution. To further improve the robustness, universal value function approximation is leveraged in the training process of UDQL to maximize generalization and connect to diagnosis by adapting fast re-planning and transfer learning.
Computer simulation and phantom experimental results show our proposed framework can securely steer flexible needles with high insertion accuracy and robustness. The framework also improves robustness by providing distribution information to clinicians for diagnosis and decision making during surgery.
Compared with previous methods, the proposed framework can perform multi-target needle insertion through single insertion point qunder continuous state space model with higher accuracy and robustness.
KeywordsDeep learning Deep reinforcement learning Needle steering Tool–tissue interaction Uncertainty
The last author would like to acknowledge the contribution of A/Prof Stephen Chang of Mount Elizabeth Hospital, Singapore for his input on surgeries and medical education.
The research and development of the prototype Image-guide Radio-frequency Ablation Surgical System was supported in parts by Research Grants from Singapore Agency of Science and Technology (A*Star) and Ministry of Education, Singapore respectively.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflict of interest.
Informed consent was obtained from all individual participants included in the study.
Human and animal rights
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.
- 5.Schaul T, Horgan D, Gregor K, Silver D (2015a) Universal value function approximators. In: International conference on machine learning. pp 1312–1320Google Scholar
- 7.Duan B, Wen R, Chng C-B, Wang W, Liu P, Qin J, Peneyra JL, Chang SK-Y, Heng P-A, Chui C-K (2015) Image-guided robotic system for radiofrequency ablation of large liver tumor with single incision. In: 2015 12th International conference on ubiquitous robots and ambient intelligence (URAI). IEEE, pp 284–289Google Scholar
- 11.Chatelain P, Krupa A, Navab N (2015) 3d ultrasound-guided robotic steering of a flexible needle via visual servoing. In: IEEE international conference on robotics and automation, ICRA’15Google Scholar
- 12.Alterovitz R, Siméon T, Goldberg KY (2007) The stochastic motion roadmap: a sampling framework for planning with Markov motion uncertainty. In: Robotics: science and systems, vol 3, pp 233–241Google Scholar
- 14.Morar A, Moldoveanu F, Gröller E (2012) Image segmentation based on active contours without edges. In: 2012 IEEE 8th international conference on intelligent computer communication and processing. IEEE, pp 213–220Google Scholar
- 15.Chen X, Nguyen BP, Chui C-K, Ong S-H (2016) Automated brain tumor segmentation using kernel dictionary learning and superpixel-level features. In: 2016 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 002547–002552Google Scholar
- 16.Sutton RS, Barto AG (1998) Introduction to reinforcement learning, vol 135. MIT Press, CambridgeGoogle Scholar
- 19.Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347
- 22.Qu C, Mannor S, Xu H (2018) Nonlinear distributional gradient temporal-difference learning. arXiv:1805.07732
- 23.Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. arXiv:1707.06887
- 24.Andrychowicz M, Wolski F, Ray A, Schneider J, Fong R, Welinder P, McGrew B, Tobin J, Abbeel OP, Zaremba W (2017) Hindsight experience replay. In: Advances in neural information processing systems, pp 5048–5058Google Scholar
- 25.Schaul T, Quan J, Antonoglou I, Silver D (2015b) Prioritized experience replay. arXiv:1511.05952
- 26.Tamar A, Di Castro D, Mannor S (2016) Learning the variance of the reward-to-go. J Mach Learn Res 17(1):361–396Google Scholar
- 28.Tan X, Chng C-B, Duan B, Ho Y, Wen R, Chen X, Lim K-B, Chui C-K (2017) Cognitive engine for robot-assisted radio-frequency ablation system. Acta Polytech Hung 14(1):129–145Google Scholar
- 29.Tan X, Chng C-B, Duan B, Ho Y, Wen R, Chen X, Lim K-B, Chui C-K (2016) Design and implementation of a patient-specific cognitive engine for robotic needle insertion. In: 2016 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 000560–000565Google Scholar
- 30.Van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double Q-learning. In: AAAI, vol 16, pp 2094–2100Google Scholar
- 33.Tokuda J, Song S-E, Fischer GS, Iordachita II, Seifabadi R, Cho NB, Tuncali K, Fichtinger G, Tempany CM, Hata N (2012) Preclinical evaluation of an MRI-compatible pneumatic robot for angulated needle placement in transperineal prostate interventions. Int J Comput Assist Radiol Surg 7(6):949–957CrossRefGoogle Scholar
- 34.Krieger A, Susil RC, Fichtinger G, Atalar E, Whitcomb LL (2004) Design of a novel MRI compatible manipulator for image guided prostate intervention. In: IEEE international conference on robotics and automation, proceedings on ICRA’04, vol 1. IEEE, pp 377–382Google Scholar