A deep learning based fusion of RGB camera information and magnetic localization information for endoscopic capsule robots
- 2.2k Downloads
A reliable, real time localization functionality is crutial for actively controlled capsule endoscopy robots, which are an emerging, minimally invasive diagnostic and therapeutic technology for the gastrointestinal (GI) tract. In this study, we extend the success of deep learning approaches from various research fields to the problem of sensor fusion for endoscopic capsule robots. We propose a multi-sensor fusion based localization approach which combines endoscopic camera information and magnetic sensor based localization information. The results performed on real pig stomach dataset show that our method achieves sub-millimeter precision for both translational and rotational movements.
KeywordsDeep Learning based Sensor Fusion Endoscopic Capsule Robots RNN-CNN (RNN:Recurrent Neural Network, CNN: Convolutional Neural Network)
Robot localization denotes the robot’s ability to establish its position and orientation within the frame of reference. Different sensors used in medical milliscale robot localization have their own particular strengths and weaknesses, which makes sensor data fusion an attractive solution. Monocular visual-magnetic odometry approaches, for example, have received considerable attention in mobile robotic sensor fusion literature. In general, localization techniques for endoscopic capsule robots can be categorized into three main groups: electromagnetic wave-based techniques; magnetic field strength-based techniques and hybrid techniquesUmay et al. (2017).
In recent years, numerous electromagnetic wave-based approaches like time of flight and difference of arrival (ToF and TDoA)-, received signal strength (RSS)-, RF identification (RFID)- and angle of arrival (AoA) based methods have been proposed Wang et al. (2011); Fischer et al. (2004); Wang et al. (2009); Ye (2013); Hou et al. (2009).
In magnetic localization systems, the magnetic source and magnetic sensor system are the essential components. The magnetic source can be designed in different ways: a permanent magnet, an embedded secondary coil, or a tri-axial magnetoresistive sensor. Magnetic sensors located outside the human body detect the magnetic flux density in order to estimate the location of the capsule (e.g., Popek et al. (2013), Natali et al. (2016), Yim and Sitti (2013)). One of the major advantages of utilizing magnetic field strength-based localization techniques is their successful coupling with magnetic locomotion systems. This could be achieved using magnetic steering, magnetic levitation, and remote magnetic manipulation. Other advantages include their robustness against attenuation by the human body. However, the disadvantage is that they experience interference from the environment. This could be handled by implementing additional hardware for handling the localization problem.
Another group of endoscopic capsule robot localization techniques is the hybrid techniques. These implement an integration of different sources at once such as RF sensors, magnetic sensors, and RGB sensors. The core idea is to integrate data from different sources which strengthen each other and can produce more accurate localization data. As a common approach Kalman filter and its derivatives are proposed to fuse RF electromagnetic signal data, magnetic sensor data, and video data. The first group of hybrid methods fuses RF and video signal Geng and Pahlavan (2016); Bao et al. (2015), whereas the second group focus on fusion of magnetic and RF signal data Umay and Fidan (2016); Geng and Pahlavan (2016); Umay and Fidan (2017) and the last group on fusion of magnetic and video data Gumprecht et al. (2013).
2 System architecture details
Optical Flow estimation.
CNN based feature vector extraction.
LSTMs based sensor fusion.
Even though the beauty of deep learning is told to be its success and easiness to process raw input data without inquiring any pre-and post processing, we do preprocessing since it increases the accuracy of our method upon our observations we made during evaluations. This section explains the preprocessing operations we applied on the raw RGB image data before passing it into the deep neural network. The operations include vessel detection, enhancement and keyframe selection.
2.1.1 Multi-scale vessel enhancement
2.1.2 Keyframe selection
Choose a candidate keyframe and extract Farneback optical flow between this and the reference keyframe.
Compute the magnitude of the extracted optical flow vector for each pixel.
Calculate the cumulative value by summing up all the magnitude values.
Normalize the cumulative value by the total number of pixels.
If the normalized cumulative value is less than \(\tau \) then go to the next frame. Otherwise, identify the candidate key frame as a key frame and repeat the process.
2.2 Optical flow extraction
2.3 Magnetic localization system
2.4 Deep CNN-RNN architecture for sensor fusion
learning rate: 0.001
solver type: Adam
batch size: 64
GPU: NVIDIA K80
We evaluate the performance of our system both quantitatively and qualitatively in terms of trajectory estimation. We also report the computational time requirements of the method.
4.1 Trajectory estimation
The absolute trajectory (ATE) root-mean-square error metric (RMSE) is used for quantitative comparisons, which measures the root-mean-square of Euclidean distances between all estimated endoscopic capsule robot poses and the ground truth poses. We created six different trajectories with various complexity levels. Overfitting, which would make the resulting pose estimator inapplicable in other scenarios, was prevented using dropout and early stopping techniques. The dropout regularization technique, which samples a part of the whole network and updates its parameters based on the input data, is an extremely effective and simple method to avoid overfitting.
In this study, we presented, to the best of our knowledge, the first sensor fusion method based on deep learning for endoscopic capsule robots. The proposed CNN-RNN architecture based fusion approach is able to achieve simultaneous learning and sequential modeling of motion dynamics across frames and magnetic data streams. Since it is trained in an end-to-end manner, there is no need to carefully hand-tune the parameters of the system except the hyperparameters. In the future, we will incorporate controlled actuation into the scenario to investigate a more complete system, and additionally we will seek ways to make the system more robust against representational singularities in the rotation data.
Open access funding provided by Max Planck Society.
- Arshak, K., Adepoju, F.: Capsule tracking in the gi tract: a novel microcontroller based solution. In: Sensors Applications Symposium, 2006. Proceedings of the 2006 IEEE, pp. 186–191. IEEE (2006)Google Scholar
- Beauchemin, S.S., Barron J.L.: The computation of optical flow. ACM Comput Surv 27(3), 433–466 (1995)Google Scholar
- Cornelius, N., Kanade, T.: Adapting optical-flow to measure object motion in reflectance and X-ray image sequences. In: Proceedings of ACM SIGGRAPH Computer Graphics, vol. 18, pp. 24–25. ACM, New york, NY (1984). https://doi.org/10.1145/988525.988537
- Clark, R., Wang, S., Wen, H., Markham, A., Trigoni, N.: Vinet: visual-inertial odometry as a sequence-to-sequence learning problem. In: AAAI, pp. 3995–4001 (2017)Google Scholar
- Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Bigun, J., Gustavsson, T. (eds.) Image Analysis. Lecture Notes in Computer Science, vol. 2749, pp. 363–370. Springer, Heidelberg (2003)Google Scholar
- Frangi, A.F., Niessen, W.J., Vincken, K.L., Viergever, M.A.: Multiscale vessel enhancement filtering. In: Wells W.M., Colchester A., Delp S. (eds.) Medical Image Computing and Computer-Assisted Intervention—MICCAI’98. Lecture Notes in Computer Science, vol. 1496, pp. 130–137. Springer, Heidelberg (1998)Google Scholar
- Hou, J., Zhu, Y., Zhang, L., Fu, Y., Zhao, F., Yang, L., Rong, G.: Design and implementation of a high resolution localization system for in-vivo capsule endoscopy. In: Dependable, Autonomic and Secure Computing, 2009. DASC’09. Eighth IEEE International Conference on, pp. 209–214. IEEE (2009)Google Scholar
- Popek, K.M., Mahoney, A.W., Abbott, J.J.: Localization method for a magnetic capsule endoscope propelled by a rotating magnetic dipole field. In: 2013 IEEE International Conference on Robotics and Automation (ICRA), pp. 5348–5353. IEEE (2013)Google Scholar
- Son, D., Dogan, M.D., Sitti, M.: Magnetically actuated soft capsule endoscope for fine-needle aspiration biopsy. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 1132–1139. IEEE (2017)Google Scholar
- Turan, M., Almalioglu, Y., Araujo, H., Konukoglu, E., Sitti, M.: A non-rigid map fusion-based rgb-depth slam method for endoscopic capsule robots. arXiv preprint arXiv:1705.05444 (2017)
- Umay, I., Fidan, B.: Adaptive magnetic sensing based wireless capsule localization. In: 2016 10th International Symposium on Medical Information and Communication Technology (ISMICT), pp. 1–5. IEEE (2016)Google Scholar
- Wang, L., Hu, C., Tian, L., Li, M., Meng, M.Q.H.: A novel radio propagation radiation model for location of the capsule in gi tract. In: 2009 IEEE International Conference on Robotics and Biomimetics (ROBIO), pp. 2332–2337. IEEE (2009)Google Scholar
- Wang, Y., Fu, R., Ye, Y., Khan, U., Pahlavan, K.: Performance bounds for RF positioning of endoscopy camera capsules. In: 2011 IEEE Topical Conference on Biomedical Wireless Technologies, Networks, and Sensing Systems (BioWireleSS), pp. 71–74. IEEE (2011)Google Scholar
- Ye, Y.: Bounds on RF cooperative localization for video capsule endoscopy. Ph.D. thesis, Worcester Polytechnic Institute (2013)Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.