Wide baseline pose estimation from video with a density-based uncertainty model

Pellicanò, Nicola; Aldea, Emanuel; Le Hégarat-Mascle, Sylvie

doi:10.1007/s00138-019-01036-6

Wide baseline pose estimation from video with a density-based uncertainty model

Original Paper
Published: 13 June 2019

Volume 30, pages 1041–1059, (2019)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Nicola Pellicanò¹,
Emanuel Aldea¹ &
Sylvie Le Hégarat-Mascle¹

277 Accesses
4 Citations
Explore all metrics

Abstract

Robust wide baseline pose estimation is an essential step in the deployment of smart camera networks. In this work, we highlight some current limitations of conventional strategies for relative pose estimation in difficult urban scenes. Then, we propose a solution which relies on an adaptive search of corresponding interest points in synchronized video streams which allows us to converge robustly toward a high-quality solution. The core idea of our algorithm is to build across the image space a nonstationary mapping of the local pose estimation uncertainty, based on the spatial distribution of interest points. Subsequently, the mapping guides the selection of new observations from the video stream in order to prioritize the coverage of areas of high uncertainty. With an additional step in the initial stage, the proposed algorithm may also be used for refining an existing pose estimation based on the video data; this mode allows for performing a data-driven self-calibration task for stereo rigs for which accuracy is critical, such as onboard medical or vehicular systems. We validate our method on three different datasets which cover typical scenarios in pose estimation. The results show a fast and robust convergence of the solution, with a significant improvement, compared to single image-based alternatives, of the RMSE of ground-truth matches, and of the maximum absolute error.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scale Drift Correction of Camera Geo-Localization Using Geo-Tagged Images

Video Registration to SfM Models

CRF-Based Reconstruction from Narrow-Baseline Image Sequences

Notes

Implementation available at: https://github.com/MOHICANS-project/fundvid.

References

Ataer-Cansizoglu, E., Taguchi, Y., Ramalingam, S., Miki, Y.: Calibration of non-overlapping cameras using an external slam system. In: 2nd International Conference on 3D Vision (3DV), vol. 1, pp. 509–516. IEEE (2014)
Ayaz, S.M., Kim, M.Y., Park, J.: Survey on zoom-lens calibration methods and techniques. Mach. Vis. Appl. 28(8), 803–818 (2017)
Article Google Scholar
Boutros, N., Shortis, M.R., Harvey, E.S.: A comparison of calibration methods and system configurations of underwater stereo-video systems for applications in marine ecology. Limnol. Oceanogr. Methods 13(5), 224–236 (2015)
Article Google Scholar
Brückner, M., Bajramovic, F., Denzler, J.: Intrinsic and extrinsic active self-calibration of multi-camera systems. Mach. Vis. Appl. 25(2), 389–403 (2014)
Article Google Scholar
Caspi, Y., Simakov, D., Irani, M.: Feature-based sequence-to-sequence matching. Int. J. Comp. Vis. 68(1), 53–64 (2006)
Article Google Scholar
Chaquet, J.M., Carmona, E.J., Fernández-Caballero, A.: A survey of video datasets for human action and activity recognition. Comput. Vis. Image Underst. 117(6), 633–659 (2013)
Article Google Scholar
Chum, O., Matas, J.: Matching with prosac-progressive sample consensus. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. 1, pp. 220–226. IEEE (2005)
Conte, D., Foggia, P., Percannella, G., Vento, M.: Counting moving persons in crowded scenes. Mach. Vis. Appl. 24(5), 1029–1042 (2013)
Article Google Scholar
Dang, T., Hoffmann, C., Stiller, C.: Continuous stereo self-calibration by camera parameter tracking. IEEE Trans. Image Process. 18(7), 1536–1550 (2009)
Article MathSciNet MATH Google Scholar
Devarajan, D., Radke, R.J., Chung, H.: Distributed metric calibration of ad hoc camera networks. ACM Trans. Sensor Netw. (TOSN) 2(3), 380–403 (2006)
Article Google Scholar
Dubuisson, S., Gonzales, C.: A survey of datasets for visual tracking. Mach. Vis. Appl. 27(1), 23–52 (2016)
Article Google Scholar
Eshel, R., Moses, Y.: Tracking in a dense crowd using multiple cameras. Int. J. Comput. Vis. 88(1), 129–143 (2010). https://doi.org/10.1007/s11263-009-0307-0
Article Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. Kdd 96, 226–231 (1996)
Google Scholar
Ferryman, J., Shahrokni, A.: Pets2009: Dataset and challenge. In: 12th IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (PETS-Winter), 2009, pp. 1–6. IEEE (2009)
Foroughi, H., Ray, N., Zhang, H.: Robust people counting using sparse representation and random projection. Pattern Recognit. 48(10), 3038–3052 (2015)
Article Google Scholar
Fradi, H., Luvison, B., Pham, Q.C.: Crowd behavior analysis using local mid-level visual descriptors. IEEE Trans. Circuits Syst. Video Technol. 27(3), 589–602 (2017). https://doi.org/10.1109/TCSVT.2016.2615443
Article Google Scholar
Fraundorfer, F., Tanskanen, P., Pollefeys, M.: A minimal case solution to the calibrated relative pose problem for the case of two known orientation angles. Comput. Vis.-ECCV 2010, 269–282 (2010)
Google Scholar
Gemeiner, P., Micusik, B., Pflugfelder, R.: Calibration Methodology for Distant Surveillance Cameras, pp. 162–173. Springer, Cham (2015)
Google Scholar
Goldman, Y., Rivlin, E., Shimshoni, I.: Robust epipolar geometry estimation using noisy pose priors. Image Vis. Comput. 67, 16–28 (2017)
Article Google Scholar
Guo, X., Cao, X.: Triangle-constraint for finding more good features. In: International Conference on Pattern Recognition (ICPR), pp. 1393–1396 (2010)
Hansen, P., Alismail, H., Rander, P., Browning, B.: Online continuous stereo extrinsic parameter estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 1059–1066. IEEE (2012)
Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, second edn. Cambridge University Press, ISBN: 0521540518 (2004)
Kasten, Y., Ben-Artzi, G., Peleg, S., Werman, M.: Fundamental matrices from moving objects using line motion barcodes. In: European Conference on Computer Vision, pp. 220–228. Springer (2016)
Khan, S.M., Shah, M.: Tracking multiple occluding people by localizing on multiple scene planes. IEEE Trans. Pattern Anal. Mach. Intell. 31(3), 505–519 (2009)
Article Google Scholar
Kneip, L., Chli, M., Siegwart, R.Y.: Robust real-time visual odometry with a single camera and an IMU. In: Proceedings of the British Machine Vision Conference 2011. British Machine Vision Association (2011)
Lin, B., Johnson, A., Qian, X., Sanchez, J., Sun, Y.: Simultaneous tracking, 3d reconstruction and deforming point detection for stereoscope guided surgery. In: Augmented Reality Environments for Medical Imaging and Computer-Assisted Interventions, pp. 35–44. Springer (2013)
Lin, W.Y., Cheong, L.F., Tan, P., Dong, G., Liu, S.: Simultaneous camera pose and correspondence estimation with motion coherence. Int. J. Comput. Vis. 96(2), 145–161 (2012)
Article MathSciNet MATH Google Scholar
Lin, W.Y., Liu, S., Jiang, N., Do, M.N., Tan, P., Lu, J.: Repmatch: Robust feature matching and pose for reconstructing modern cities. In: European Conference on Computer Vision, pp. 562–579. Springer (2016)
Ling, Y., Shen, S.: High-precision online markerless stereo extrinsic calibration. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016, pp. 1771–1778. IEEE (2016)
Liu, Z., Monasse, P., Marlet, R.: Match selection and refinement for highly accurate two-view structure from motion. In: European Conference on Computer Vision, pp. 818–833. Springer (2014)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comp. Vis. 60(2), 91–110 (2004)
Article Google Scholar
Madrigal, F., Hayet, J.B., Rivera, M.: Motion priors for multiple target visual tracking. Mach. Vis. Appl. 26(2–3), 141–160 (2015)
Article Google Scholar
Mahmoud, N., Hostettler, A., Collins, T., Soler, L., Doignon, C., Montiel, J.M.M.: SLAM based quasi dense reconstruction for minimally invasive surgery scenes. ICRA 2017 workshop C4 Surgical Robots: Compliant, Continuum, Cognitive, and Collaborative (2017)
Maier-Hein, L., Groch, A., Bartoli, A., Bodenstedt, S., Boissonnat, G., Chang, P.L., Clancy, N., Elson, D.S., Haase, S., Heim, E., et al.: Comparative validation of single-shot optical techniques for laparoscopic 3-d surface reconstruction. IEEE Trans. Med. Imaging 33(10), 1913–1930 (2014)
Article Google Scholar
Martinec, D., Pajdla, T.: Robust rotation and translation estimation in multiview reconstruction. In: IEEE Conference on Computer Vision and Pattern Recognition, 2007, CVPR’07, pp. 1–8. IEEE (2007)
Mavrinac, A., Chen, X.: Modeling coverage in camera networks: a survey. Int. J. Comput. Vis. 101(1), 205–226 (2013)
Article MathSciNet Google Scholar
Mehmood, M.O., Ambellouis, S., Achard, C.: Ghost pruning for people localization in overlapping multicamera systems. In: International Conference on Computer Vision Theory and Applications (VISAPP), 2014, vol. 2, pp. 632–639. IEEE (2014)
Milan, A., Roth, S., Schindler, K.: Continuous energy minimization for multitarget tracking. IEEE Trans. Pattern Anal. Mach. Intell. 36(1), 58–72 (2014)
Article Google Scholar
Moisan, L., Stival, B.: A probabilistic criterion to detect rigid point matches between two images and estimate the fundamental matrix. Int. J. Comp. Vis. 57(3), 201–218 (2004)
Article Google Scholar
Mountney, P., Stoyanov, D., Yang, G.Z.: Three-dimensional tissue deformation recovery and tracking. IEEE Signal Process. Mag. 27(4), 14–24 (2010)
Article Google Scholar
Mountney, P., Yang, G.Z.: Dynamic view expansion for minimally invasive surgery using simultaneous localization and mapping. In: Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2009, EMBC 2009, pp. 1184–1187. IEEE (2009)
Mueller, G.R., Wuensche, H.J.: Continuous extrinsic online calibration for stereo cameras. In: Intelligent Vehicles Symposium (IV), 2016 IEEE, pp. 966–971. IEEE (2016)
Ochoa, B., Belongie, S.: Covariance propagation for guided matching. In: Workshop on Statistical Methods in Multi-Image and Video Processing (2006)
Pellicano, N., Aldea, E., Le Hégarat-Mascle, S.: Robust wide baseline pose estimation from video. In: 23rd International Conference on Pattern Recognition (ICPR), 2016, pp. 3820–3825. IEEE (2016)
Pellicanò, N., Aldea, E., Le Hégarat-Mascle, S.: Geometry-based multiple camera head detection in dense crowds. In: Proceedings of the 28th British Machine Vision Conference (BMVC)—5th Activity Monitoring by Multiple Distributed Sensing Workshop (2017)
Peng, P., Tian, Y., Wang, Y., Li, J., Huang, T.: Robust multiple cameras pedestrian detection with multi-view bayesian network. Pattern Recognit. 48(5), 1760–1772 (2015)
Article Google Scholar
Pollefeys, M., Koch, R., Van Gool, L.: Self-calibration and metric reconstruction inspite of varying and unknown intrinsic camera parameters. Int. J. Comput. Vis. 32(1), 7–25 (1999)
Article Google Scholar
Pollok, T., Monari, E.: A visual slam-based approach for calibration of distributed camera networks. In: 13th IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2016, Colorado Springs, CO, USA, August 23–26, 2016, pp. 429–437 (2016). https://doi.org/10.1109/AVSS.2016.7738081
Puig, L., Daniilidis, K.: Monocular 3d tracking of deformable surfaces. In: IEEE International Conference on Robotics and Automation (ICRA), 2016, pp. 580–586. IEEE (2016)
Radke, R.J.: A survey of distributed computer vision algorithms. Handbook of Ambient Intelligence and Smart Environments pp. 35–55 (2010)
Raguram, R., Chum, O., Pollefeys, M., Matas, J., Frahm, J.M.: Usac: a universal framework for random sample consensus. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 2022–2038 (2013)
Article Google Scholar
Ravichandran, A., Vidal, R.: Video registration using dynamic textures. Patt. Anal. Mach. Intell. 33(1), 158–171 (2011)
Article Google Scholar
Remondino, F., Fraser, C.: Digital camera calibration methods: considerations and comparisons. Int. Arch. Photogr. Rem. Sens. Spat. Inf. Sci. 36(5), 266–272 (2006)
Google Scholar
SanMiguel, J.C., Micheloni, C., Shoop, K., Foresti, G.L., Cavallaro, A.: Self-reconfigurable smart camera networks. IEEE Comput. 47(5), 67–73 (2014)
Article Google Scholar
Sekii, T.: Robust, real-time 3d tracking of multiple objects with similar appearances. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4275–4283 (2016)
Sheather, S.J., Jones, M.C.: A reliable data-based bandwidth selection method for kernel density estimation. J. R. Stat. Soc. Ser. B (Methodol.) 53(3), 683–690 (1991)
MathSciNet MATH Google Scholar
Smeulders, A.W., Chu, D.M., Cucchiara, R., Calderara, S., Dehghan, A., Shah, M.: Visual tracking: An experimental survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1442–1468 (2014)
Article Google Scholar
Snavely, N., Seitz, S.M., Szeliski, R.: Modeling the world from internet photo collections. Int. J. Comp. Vis. 80(2), 189–210 (2008)
Article Google Scholar
STEREOLABS: ZED Stereo Camera (2018). https://www.stereolabs.com/
Sur, F., Noury, N., Berger, M.O.: Computing the uncertainty of the 8 point algorithm for fundamental matrix estimation. In: 19th British Machine Vision Conference-BMVC 2008, p. 10 (2008)
Tan, X., Sun, C., Sirault, X., Furbank, R., Pham, T.D.: Feature matching in stereo images encouraging uniform spatial distribution. Pattern Recognit. 48(8), 2530–2542 (2015)
Article Google Scholar
Tang, N.C., Lin, Y.Y., Weng, M.F., Liao, H.Y.M.: Cross-camera knowledge transfer for multiview people counting. IEEE Trans. Image Process. 24(1), 80–93 (2015)
Article MathSciNet MATH Google Scholar
Tang, S., Andriluka, M., Milan, A., Schindler, K., Roth, S., Schiele, B.: Learning people detectors for tracking in crowded scenes. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1049–1056 (2013)
Totz, J., Mountney, P., Stoyanov, D., Yang, G.Z.: Dense surface reconstruction for enhanced navigation in mis. Med. Image Comput. Comput.-Assist. Interv.-MICCAI 2011, 89–96 (2011)
Google Scholar
Tsai, R.: A versatile camera calibration technique for high-accuracy 3d machine vision metrology using off-the-shelf tv cameras and lenses. IEEE J. Robot. Autom. 3(4), 323–344 (1987)
Article Google Scholar
Utasi, Á., Benedek, C.: A bayesian approach on people localization in multicamera systems. IEEE Trans. Circuits Syst. Video Technol. 23(1), 105–115 (2013)
Article Google Scholar
Visentini-Scarzanella, M., Stoyanov, D., Yang, G.Z.: Metric depth recovery from monocular images using shape-from-shading and specularities. In: 19th IEEE International Conference on Image Processing (ICIP), 2012, pp. 25–28. IEEE (2012)
Wang, B., Wang, G., Chan, K.L., Wang, L.: Tracklet association by online target-specific metric learning and coherent dynamics estimation. IEEE Trans. Pattern Anal. Mach. Intell. 39(3), 589–602 (2017)
Article Google Scholar
Wu, S., Wong, H.S., Yu, Z.: A bayesian model for crowd escape behavior detection. IEEE Trans. Circuits Syst. Video Technol. 24(1), 85–98 (2014)
Article Google Scholar
Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)
Article Google Scholar
Xiao, C.B., Feng, D.Z., Yuan, M.D.: An efficient fundamental matrix estimation method for wide baseline images. Pattern Analysis and Applications pp. 1–10 (2016)
Ye, M., Giannarou, S., Meining, A., Yang, G.Z.: Online tracking and retargeting with applications to optical biopsy in gastrointestinal endoscopic examinations. Med. Image Anal. 30, 144–157 (2016)
Article Google Scholar
Ye, M., Giannarou, S., Patel, N., Teare, J., Yang, G.Z.: Pathological site retargeting under tissue deformation using geometrical association and tracking. In: International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 67–74. Springer (2013)
Zamir, A.R., Dehghan, A., Shah, M.: Gmcp-tracker: Global multi-object tracking using generalized minimum clique graphs. In: Computer Vision–ECCV 2012, pp. 343–356. Springer (2012)
Zhang, Z.: Determining the epipolar geometry and its uncertainty: a review. Int. J. Comp. Vis. 27(2), 161–195 (1998)
Article Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge the support of Regent’s Park Mosque for providing access to the site during data collection, and of K. Kiyani. This work was partly funded by ANR grant ANR-15-CE39-0005 and by QNRF grant NPRP-09-768-1-114.

Author information

Authors and Affiliations

SATIE, Université Paris-Sud, Université Paris-Saclay, Rue Noetzlin, Gif-sur-Yvette, 91190, France
Nicola Pellicanò, Emanuel Aldea & Sylvie Le Hégarat-Mascle

Authors

Nicola Pellicanò
View author publications
You can also search for this author in PubMed Google Scholar
Emanuel Aldea
View author publications
You can also search for this author in PubMed Google Scholar
Sylvie Le Hégarat-Mascle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nicola Pellicanò.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pellicanò, N., Aldea, E. & Le Hégarat-Mascle, S. Wide baseline pose estimation from video with a density-based uncertainty model. Machine Vision and Applications 30, 1041–1059 (2019). https://doi.org/10.1007/s00138-019-01036-6

Download citation

Received: 26 January 2018
Revised: 26 April 2019
Accepted: 23 May 2019
Published: 13 June 2019
Issue Date: 01 September 2019
DOI: https://doi.org/10.1007/s00138-019-01036-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Wide baseline pose estimation from video with a density-based uncertainty model

Abstract

Access this article

Similar content being viewed by others

Scale Drift Correction of Camera Geo-Localization Using Geo-Tagged Images

Video Registration to SfM Models

CRF-Based Reconstruction from Narrow-Baseline Image Sequences

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Wide baseline pose estimation from video with a density-based uncertainty model

Abstract

Access this article

Similar content being viewed by others

Scale Drift Correction of Camera Geo-Localization Using Geo-Tagged Images

Video Registration to SfM Models

CRF-Based Reconstruction from Narrow-Baseline Image Sequences

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation