Abstract
Tracking is the main enabling technology for Augmented Reality (AR) as it allows realistic placement of virtual content in the real world. In this chapter, we discuss the most important aspects of tracking for AR while reviewing existing systems that shaped the field over the past years. Initially, we provide a notation for the description of 6 Degree of Freedom (6DoF) poses and camera models. Subsequently, we describe fundamental computer vision techniques that tracking systems frequently use such as feature matching and tracking or pose estimation. We divide the description of tracking approaches into model-based approaches and Simultaneous Localization and Mapping (SLAM) approaches. Model-based approaches use a synthetic representation of an object as a template in order to match the real object. This matching can use texture or lines as tracking features in order to establish correspondences from the models to the image, whereas machine learning approaches for direct pose estimation of an object from an input image have also been recently introduced. Currently, an upcoming challenge is the extension of tracking systems for AR from rigid objects to articulated and nonrigid objects. SLAM tracking systems do not require any models as a reference as they can simultaneously track and map their environment. We discuss keypoint-based, direct, and semi-direct purely visual SLAM system approaches. Next, we analyze the use of additional sensors that can support tracking such as visual-inertial sensor fusion techniques or depth sensing. Finally, we also look at the use of machine learning techniques and especially the use of deep neural networks in conjunction with traditional computer vision approaches for SLAM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
(2014). http://www.cs.cmu.edu/~kaess/vslam_cvpr14/media/vslam-tutorial-cvpr14-p12-densevo.pdf.
Google ARCore (2018). https://developers.google.com/ar
Microsoft Hololens (2018). https://www.microsoft.com/en-us/hololens
VisionLib (2018). https://visionlib.com
Vuforia (2018). https://www.vuforia.com
Adagolodjo, Y., Trivisonne, R., Haouchine, N., Cotin, S., Courtecuisse, H.: Silhouette-based pose estimation for deformable organs application to surgical augmented reality. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 539–544. IEEE, New York (2017)
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)
Baker, S., Bennett, E., Kang, S.B., Szeliski, R.: Removing rolling shutter wobble. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2392–2399. IEEE, New York (2010)
Baker, S., Scharstein, D., Lewis, J.P., Roth, S., Black, M.J., Szeliski, R.: A database and evaluation methodology for optical flow. Int. J. Comput. Vis. 92(1), 1–31 (2011)
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Proceedings of the Conference on Neural Information Processing Systems, pp. 153–160 (2007)
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional Siamese networks for object tracking. In: Proceedings of the European Conference on Computer Vision, pp. 850–865. Springer, Berlin (2016)
Bleser, G.: Towards visual-inertial slam for mobile augmented reality. Verlag Dr. Hut, Germany (2009)
Bloesch, M., Czarnowski, J., Clark, R., Leutenegger, S., Davison, A.J.: CodeSLAM-learning a compact, optimisable representation for dense visual SLAM. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2560–2568. IEEE, New York (2018)
Bouguet, J.Y.: Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm. Intel Corporation 5(1–10), 4 (2001)
Brachmann, E., Michel, F., Krull, A., Ying Yang, M., Gumhold, S., et al.: Uncertainty-driven 6d pose estimation of objects and scenes from a single RGB image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3364–3372. IEEE, New York (2016)
Bradski, G.: The OpenCV Library. Dr Dobb’s J. Softw. Tools 25(11), 120–123 (2000)
Brox, T., Rosenhahn, B., Gall, J., Cremers, D.: Combined region and motion-based 3D tracking of rigid and articulated objects. IEEE Trans. Pattern Anal. Mach. Intell. 32(3), 402–415 (2009)
Buerli, M., Misslinger, S.: Introducing ARKit-Augmented Reality for iOS. In: Proceedings of the Apple Worldwide Developers Conference, pp. 1–187 (2017)
Bulat, A., Tzimiropoulos, G.: How far are we from solving the 2D & 3D Face Alignment problem? (and a dataset of 230,000 3D facial landmarks). In: Proceedings of the IEEE International Conference on Computer Vision. IEEE, New York (2017)
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: Binary robust independent elementary features. In: Proceedings of the European Conference on Computer Vision, pp. 778–792. Springer, Berlin (2010)
Caron, G., Dame, A., Marchand, E.: Direct model based visual tracking and pose estimation using mutual information. Image Vis. Comput. 32(1), 54–63 (2014)
Chen, L., Day, T.W., Tang, W., John, N.W.: Recent developments and future challenges in medical mixed reality. In: Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, pp. 123–135. IEEE, New York (2017)
Chen, C., Zhu, H., Li, M., You, S.: A review of visual-inertial simultaneous localization and mapping from filtering-based and optimization-based perspectives. Robotics 7(3), 45 (2018a)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018b)
Clark, R., Wang, S., Wen, H., Markham, A., Trigoni, N.: Vinet: Visual-inertial odometry as a sequence-to-sequence learning problem. In: Proceedigs of the AAAI Conference on Artificial Intelligence (2017)
Concha, A., Civera, J.: DPPTAM: Dense piecewise planar tracking and mapping from a monocular sequence. In: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, ART-2015-92153. IEEE, New York (2015)
Davison, A., Murray, D.: Simultaneous localization and map-building using active vision. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 865–880 (2002)
Davison, A.J., Reid, I.D., Molton, n.d., Stasse, O.: MonoSLAM: Real-time single camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. 29(6), 1052–1067 (2007)
Dementhon, D.F., Davis, L.S.: Model-based object pose in 25 lines of code. Int. J. Comput. Vis. 15(1–2), 123–141 (1995)
DeTone, D., Malisiewicz, T., Rabinovich, A.: Superpoint: Self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 224–236. IEEE, New York (2018)
Drummond, T., Cipolla, R.: Real-time visual tracking of complex structures. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 932–946 (2002)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12(Jul), 2121–2159 (2011)
Engel, J., Schöps, T., Cremers, D.: LSD-SLAM: Large-scale direct monocular SLAM. In: Proceedings of the European Conference on Computer Vision, pp. 834–849. Springer, Berlin (2014)
Fiala, M.: Artag revision 1, a fiducial marker system using digital techniques. Natl. Res. Counc. Publ. 47419, 1–47 (2004)
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Fleet, D., Weiss, Y.: Optical flow estimation. In: Handbook of mathematical models in computer vision, pp. 237–257. Springer, Berlin (2006)
Forster, C., Pizzoli, M., Scaramuzza, D.: SVO: Fast semi-direct monocular visual odometry. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 15–22. IEEE, New York (2014)
Forster, C., Carlone, L., Dellaert, F., Scaramuzza, D.: IMU preintegration on manifold for efficient visual-inertial maximum-a-posteriori estimation. Georgia Institute of Technology, New York (2015)
Gao, X.S., Hou, X.R., Tang, J., Cheng, H.F.: Complete solution classification for the perspective-three-point problem. IEEE Trans. Pattern Anal. Mach. Intell. 25(8), 930–943 (2003)
Garon, M., Lalonde, J.F.: Deep 6-DOF tracking. IEEE Trans. Vis. Comput. Graph. 23(11), 2410–2418 (2017)
Garon, M., Laurendeau, D., Lalonde, J.F.: A framework for evaluating 6-DOF object trackers. In: Proceedings of the European Conference on Computer Vision, pp. 582–597. Springer, Berlin (2018)
Gemeiner, P., Einramhof, P., Vincze, M.: Simultaneous motion and structure estimation by fusion of inertial and vision data. Int. J. Robot. Res. 26(6), 591–605 (2007)
Getting, I.A.: Perspective/navigation-the global positioning system. IEEE Spectr. 30(12), 36–38 (1993)
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep learning, vol. 1. MIT Press, Cambridge (2016)
Hager, G.D., Belhumeur, P.N.: Efficient region tracking with parametric models of geometry and illumination. IEEE Trans. Pattern Anal. Mach. Intell. 20(10):1025–1039 (1998)
Harris, C., Stennett, C.: Rapid-a video rate object tracker. In: Proceedings of the British Machine Vision Conference, pp. 1–6 (1990)
Harris, C., Stephens, M.: A combined corner and edge detector. In: Proceedings of the Alvey Vision Conference, vol. 15, pp. 10–5244 (1988)
Hartley, R.I., Sturm, P.: Triangulation. Comput. Vis. Image Underst. 68(2), 146–157 (1997)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. IEEE, New York (2016)
Heikkila, J., Silven, O.: A four-step camera calibration procedure with implicit image correction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1106–1112. IEEE, New York (1997)
Hesch, J.A., Roumeliotis, S.I.: A direct least-squares (DLS) method for PnP. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 383–390. IEEE, New York (2011)
Hinterstoisser, S., Lepetit, V., Ilic, S., Holzer, S., Bradski, G., Konolige, K., Navab, N.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Proceedings of the Asian Conference on Computer Vision, pp. 548–562. Springer, Berlin (2012)
Huber, P.J.: Robust statistics. In: International Encyclopedia of Statistical Science, pp. 1248–1251. Springer, Berlin (2011)
Jin, H., Favaro, P., Soatto, S.: Real-time feature tracking and outlier rejection with changes in illumination. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 1, pp. 684–689. IEEE, New York (2001)
Kato, H., Billinghurst, M.: Marker tracking and HMD calibration for a video-based augmented reality conferencing system. In: Proceedings of the IEEE/ACM International Workshop on Augmented Reality, pp. 85–94. IEEE, New York (1999)
Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1867–1874. IEEE, New York (2014)
Kehl, W., Manhardt, F., Tombari, F., Ilic, S., Navab, N.: SSD-6D: Making RGB-based 3D detection and 6D pose estimation great again. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 22–29. IEEE, New York (2017a)
Kehl, W., Tombari, F., Ilic, S., Navab, N.: Real-time 3D model tracking in color and depth on a single CPU core. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 745–753. IEEE, New York (2017b)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:14126980 (2014)
Klein, G., Murray, D.: Parallel tracking and mapp.ing for small AR workspaces. In: Proceedings of the IEEE/ACM International Symposium on Mixed and Augmented Reality, pp. 225–234. IEEE, New York (2007)
Klein, G., Murray, D.: Parallel tracking and mapping on a camera phone. In: Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, pp. 83–86. IEEE, New York (2009)
Köhler, J., Pagani, A., Stricker, D.: Detection and identification techniques for markers used in computer vision. In: Proceedings of the Visualization of Large and Unstructured Data Sets-Applications in Geospatial Planning, Modeling and Engineering, Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2011)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Proceedings of the Conference on Neural Information Processing Systems, pp. 1097–1105 (2012)
Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: Proceedings of the International Conference on 3D Vision, pp. 239–248. IEEE, New York (2016)
LeCun, Y., et al. Generalization and network design strategies. Connectionism Perspect. 19, 143–155 (1989)
LeCun, Y., Kavukcuoglu, K., Farabet, C.: Convolutional networks and applications in vision. In: Proceedings of the IEEE International Symposium on Circuits and Systems, pp. 253–256. IEEE, New York (2010)
Lepetit, V., Moreno-Noguer, F., Fua, P.: EPnP: An accurate o(n) solution to the PnP problem. Int. J. Comput. Vis. 81(2), 155 (2009)
Leutenegger, S., Lynen, S., Bosse, M., Siegwart, R., Furgale, P.: Keyframe-based visual–inertial odometry using nonlinear optimization. Int. J. Rob. Res. 34(3), 314–334 (2015)
Li, M., Mourikis, A.I.: High-precision, consistent EKF-based visual-inertial odometry. Int. J. Rob. Res. 32(6), 690–711 (2013)
Li, Y., Wang, G., Ji, X., Xiang, Y., Fox, D.: DeepIM: Deep iterative matching for 6D pose estimation. In: Proceedings of the European Conference on Computer Vision, pp. 683–698. Springer, Berlin (2018)
Liang, C.K., Chang, L.W., Chen, H.H.: Analysis and compensation of rolling shutter effect. IEEE Trans. Image Process. 17(8), 1323–1330 (2008)
Long, M., Cao, Y., Wang, J., Jordan, M.I.: Learning transferable features with deep adaptation networks. arXiv preprint arXiv:150202791 (2015)
Longuet-Higgins, H.C.: A computer algorithm for reconstructing a scene from two projections. Nature 293(5828), 133 (1981)
Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157. IEEE, New York (1999)
Lucas, B.D., Kanade, T., et al. An iterative image registration technique with an application to stereo vision. Technical Report (1981)
Luo, W., Schwing, A.G., Urtasun, R.: Efficient deep learning for stereo matching. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5695–5703. IEEE, New York (2016)
McCormac, J., Handa, A., Davison, A., Leutenegger, S.: SemanticFusion: Dense 3D semantic mapping with convolutional neural networks. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 4628–4635. IEEE, New York (2017)
McCormac, J., Clark, R., Bloesch, M., Davison, A., Leutenegger, S.: Fusion++: Volumetric object-level slam. In: Proceedings of the International Conference on 3D Vision, pp. 32–41. IEEE, New York (2018)
Mukherjee, D., Wu, Q.M.J., Wang, G.: A comparative experimental study of image feature detectors and descriptors. Mach. Vision Appl. 26(4), 443–466 (2015)
Mur-Artal, R., Montiel, J.M.M., Tardos, J.: ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans. Rob. 31(5), 1147–1163 (2015)
Naimark, L., Foxlin, E.: Circular data matrix fiducial system and robust image processing for a wearable vision-inertial self-tracker. In: Proceedings of the IEEE/ACM International Symposium on Mixed and Augmented Reality, p. 27. IEEE, New York (2002)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the International Conference on Machine Learning, pp. 807–814 (2010)
Newcombe, R., Izadi, S., Hilliges, O., Molyneaux, D., Kim, D., Davison, A., Kohi, P., Shotton, J., Hodges, S., Fitzgibbon, A.: KinectFusion: Real-time dense surface mapping and tracking. In: Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, pp. 127–136. IEEE, New York (2011a)
Newcombe, R.A., Lovegrove, S.J., Davison, A.J.: DTAM: Dense tracking and mapping in real-time. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2320–2327. IEEE, New York (2011b)
Nistér, D.: An efficient solution to the five-point relative pose problem. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 0756–777 (2004)
Nistér, D.: Preemptive RANSAC for live structure and motion estimation. Mach. Vision Appl. 16(5), 321–329 (2005)
Oth, L., Furgale, P., Kneip, L., Siegwart, R.: Rolling shutter camera calibration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1360–1367. IEEE, New York (2013)
Pagani, A.: Reality Models for efficient registration in Augmented Reality. Verlag Dr. Hut, Germany (2014)
Park, Y., Lepetit, V., Woo, W.: Multiple 3D object tracking for augmented reality. In: Proceedings of the IEEE/ACM International Symposium on Mixed and Augmented Reality, pp. 117–120. IEEE, New York (2008)
Paulus, C.J., Haouchine, N., Cazier, D., Cotin, S.: Augmented reality during cutting and tearing of deformable objects. In: Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, pp. 54–59. IEEE, New York (2015)
Poultney, C., Chopra, S., Cun, Y.L., et al.: Efficient learning of sparse representations with an energy-based model. In: Proceedings of the Conference on Neural Information Processing Systems, pp. 1137–1144 (2007)
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical recipes 3rd edition: The art of scientific computing. Cambridge University, Cambridge (2007)
Prisacariu, V.A., Reid, I.D.: PWP3D: Real-time segmentation and tracking of 3D objects. Int. J. Comput. Vis. 98(3), 335–354 (2012)
Puerto-Souza, G.A., Mariottini, G.L.: A fast and accurate feature-matching algorithm for minimally-invasive endoscopic images. IEEE Trans. Med. Imaging 32(7), 1201–1214 (2013)
Qin, T., Li, P., Shen, S.: Vins-mono: A robust and versatile monocular visual-inertial state estimator. IEEE Trans. Rob. 34(4), 1004–1020 (2018)
Rad, M., Lepetit, V.: BB8: A scalable, accurate, robust to partial occlusion method for predicting the 3D poses of challenging objects without using depth. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3848–3856. IEEE, New York (2017)
Radkowski, R., Herrema, J., Oliver, J.: Augmented reality-based manual assembly support with visual features for different degrees of difficulty. Int. J. Hum.-Comput. Interact. 31(5), 337–349 (2015)
Rambach, J., Tewari, A., Pagani, A., Stricker, D.: Learning to fuse: A deep learning approach to visual-inertial camera pose estimation. In: Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, pp. 71–76. IEEE, New York (2016)
Rambach, J., Pagani, A., Stricker, D.: Augmented things: enhancing AR applications leveraging the internet of things and universal 3D object tracking. In: Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, pp. 103–108. IEEE, New York (2017)
Rambach, J., Deng, C., Pagani, A., Stricker, D.: Learning 6DoF object poses from synthetic single channel images. In: Proceedings of the IEEE International Symposium on Mixed and Augmented Reality. IEEE, New York (2018)
Rambach, J., Lesur, P., Pagani, A., Stricker, D.: SlamCraft: Dense planar RGB monocular SLAM. In: Proceedings of the International Conference on Machine Vision Applications. Springer, Berlin (2019)
Reina, S.C., Solin, A., Kannala, J.: Robust gyroscope-aided camera self-calibration. In: Proceedings of the International Conference on Information Fusion, pp. 772–779. IEEE, New York (2018)
Renaudin, V., Afzal, M.H., Lachapelle, G.: Complete triaxis magnetometer calibration in the magnetic domain. J. Sens. (2010)
Ricolfe-Viala, C., Sanchez-Salmeron, A.J.: Lens distortion models evaluation. Appl. Opt. 49(30), 5914–5928 (2010)
Rosten, E., Drummond, T.: Fusing points and lines for high performance tracking. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2, pp. 1508–1515. IEEE, New York (2005)
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: An efficient alternative to SIFT or SURF. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2564–2571. IEEE, New York (2011)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533 (1986)
Runz, M., Buffier, M., Agapito, L.: Maskfusion: Real-time recognition, tracking and reconstruction of multiple moving objects. In: Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, pp. 10–20. IEEE, New York (2018)
Rusinkiewicz, S., Levoy, M.: Efficient variants of the ICP algorithm. In: Proceedings of the International Conference on 3-D Digital Imaging and Modeling, vol. 1, pp. 145–152 (2001)
Salas-Moreno, R., Newcombe, R., Strasdat, H., Kelly, P., Davison, A.: Slam++: Simultaneous localisation and mapping at the level of objects. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1352–1359. IEEE, New York (2013)
Salas-Moreno, R., Glocken, B., Kelly, P., Davison, A.: Dense planar SLAM. In: Proceedings of the IEEE International Symposium on Mixed and Augmented Reality, pp. 157–164. IEEE, New York (2014)
Sansoni, G., Trebeschi, M., Docchio, F.: State-of-the-art and applications of 3D imaging sensors in industry, cultural heritage, medicine, and criminal investigation. Sensors 9(1), 568–601 (2009)
Seo, B.K., Wuest, H.: A direct method for robust model-based 3D object tracking from a monocular RGB image. In: Proceedings of the European Conference on Computer Vision, pp. 551–562. Springer, Berlin (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556 (2014)
Su, Y., Rambach, J., Minaskan, N., Lesur, P., Pagani, A., Stricker, D.: Deep multi-state object pose estimation for augmented reality assembly. In: Proceedings of the IEEE International Symposium on Mixed and Augmented Reality. IEEE, New York (2019)
Subbarao, R., Meer, P.: Beyond RANSAC: user independent robust regression. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop, pp. 101–101. IEEE, New York (2006)
Sun, B., Saenko, K.: Deep coral: Correlation alignment for deep domain adaptation. In: Proceedings of the European Conference on Computer Vision, pp. 443–450. Springer, Berlin (2016)
Sundermeyer, M., Marton, Z.C., Durner, M., Brucker, M., Triebel, R.: Implicit 3D orientation learning for 6D object detection from RGB images. In: Proceedings of the European Conference on Computer Vision, pp. 699–715. Springer, Berlin (2018)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9. IEEE, New York (2015)
Tan, D.J., Navab, N., Tombari, F.: Looking beyond the simple scenarios: Combining learners and optimizers in 3D temporal tracking. IEEE Trans. Visual Comput. Graphics 23(11), 2399–2409 (2017)
Tateno, K., Tombari, F., Laina, I., Navab, N.: CNN-SLAM: Real-time dense monocular SLAM with learned depth prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 2. IEEE, New York (2017)
Tekin, B., Sinha, S.N., Fua, P.: Real-time seamless single shot 6D object pose prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 292–301. IEEE, New York (2018)
Titterton, D., Weston, J.L., Weston, J.: Strapdown inertial navigation technology, vol. 17. IET, United Kingdom (2004)
Triggs, B., McLauchlan, P.F., Hartley, R.I., Fitzgibbon, A.W.: Bundle adjustment—a modern synthesis. In: Proceedings of the International Workshop on Vision Algorithms, pp. 298–372. Springer, Berlin (1999)
Vacchetti, L., Lepetit, V., Fua, P.: Combining edge and texture information for real-time accurate 3D camera tracking. In: Proceedings of the IEEE/ACM International Symposium on Mixed and Augmented Reality, pp. 48–57. IEEE, New York (2004a)
Vacchetti, L., Lepetit, V., Fua, P.: Stable real-time 3D tracking using online and offline information. IEEE Trans. Pattern Anal. Mach. Intell. 26(10), 1385–1391 (2004b)
Wagner, D., Schmalstieg, D.: Artoolkitplus for pose tracking on mobile devices. In: Proceedings of 12th Computer Vision Winter Workshop, 139–146 (2007)
Wang, S., Clark, R., Wen, H., Trigoni, N.: Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 2043–2050. IEEE, New York (2017)
Wasenmüller, O., Stricker, D.: Comparison of kinect v1 and v2 depth images in terms of accuracy and precision. In: Proceedings of the Asian Conference on Computer Vision Workshops, pp. 34–45. Springer, Berlin (2016)
Whelan, T., Kaess, M., Fallon, M.F.: Kintinuous: Spatially extended {K}inect{F}usion. In: Proceedings of the Workshop on RGB-D: Advanced Reasoning with Depth Cameras (2012)
Whelan, T., Salas-Moreno, R., Glocker, B., Davison, A., Leutenegger, S.: ElasticFusion: Real-time dense SLAM and light source estimation. Int. J. Rob. Res. 35(14), 1697–1716 (2016)
Wuest, H., Vial, F., Stricker, D.: Adaptive line tracking with multiple hypotheses for augmented reality. In: Proceedings of the IEEE/ACM International Symposium on Mixed and Augmented Reality, pp. 62–69. IEEE, New York (2005)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Proceedings of the European Conference on Computer Vision, pp. 818–833. Springer, Berlin (2014)
Zhang, Z.: Iterative point matching for registration of free-form curves and surfaces. Int. J. Comput. Vis. 13(2), 119–152 (1994)
Zhang, Z., et al. Flexible camera calibration by viewing a plane from unknown orientations. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 99, pp. 666–673. IEEE, New York (1999)
Zhang, Z., Li, M., Huang, K., Tan, T.: Practical camera auto-calibration based on object appearance and motion for traffic scene visual surveillance. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE, New York (2008)
Zhi, S., Bloesch, M., Leutenegger, S., Davison, A.J.: SceneCode: Monocular dense semantic reconstruction using learned encoded scene representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11776–11785. IEEE, New York (2019)
Zhou, H., Ummenhofer, B., Brox, T.: Deeptam: Deep tracking and mapping. In: Proceedings of the European Conference on Computer Vision, pp. 822–838. Springer, Berlin (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Rambach, J., Pagani, A., Stricker, D. (2023). Principles of Object Tracking and Mapping. In: Nee, A.Y.C., Ong, S.K. (eds) Springer Handbook of Augmented Reality. Springer Handbooks. Springer, Cham. https://doi.org/10.1007/978-3-030-67822-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-67822-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67821-0
Online ISBN: 978-3-030-67822-7
eBook Packages: Computer ScienceComputer Science (R0)