Skip to main content

Enhancing collaborative road scene reconstruction with unsupervised domain alignment

Abstract

Scene reconstruction and visual localization in dynamic environments such as street scenes are a challenge due to the lack of distinctive, stable keypoints. While learned convolutional features have proven to be robust to changes in viewing conditions, handcrafted features still have advantages in distinctiveness and accuracy when applied to structure from motion. For collaborative reconstruction of road sections by a car fleet, we propose to use multimodal domain adaptation as a preprocessing step to align images in their appearance and enhance keypoint matching across viewing conditions while preserving the advantages of handcrafted features. Training a generative adversarial network for translations between different illumination and weather conditions, we evaluate qualitative and quantitative aspects of domain adaptation and its impact on feature correspondences. Combined with a multi-feature discriminator, the model is optimized for synthesis of images which do not only improve feature matching but also exhibit a high visual quality. Experiments with a challenging multi-domain dataset recorded in various road scenes on multiple test drives show that our approach outperforms other traditional and learning-based methods by improving completeness or accuracy of structure from motion with multimodal two-domain image collections in eight out of ten test scenes.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

References

  1. Agarwal, S., Snavely, N., Simon, I., Seitz, S.M., Szeliski, R.: Building Rome in a day. In: IEEE International Conference on Computer Vision (ICCV) (2009)

  2. Anoosheh, A., Sattler, T., Timofte, R., Pollefeys, M., van Gool, L.: Night-to-day image translation for retrieval-based localization. In: IEEE International Conference on Robotics and Automation (ICRA) (2019)

  3. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: International Conference on Machine Learning (ICML) (2017)

  4. Bay, H., Ess, A., Tuytelaars, T., van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110(3), 346–359 (2008)

    Article  Google Scholar 

  5. Corke, P., Paul, R., Churchill, W., Newman, P.: Dealing with shadows: capturing intrinsic scene appearance for image-based outdoor localisation. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE (2013)

  6. Crandall, D.J., Owens, A., Snavely, N., Huttenlocher, D.P.: SfM with MRFs: discrete-continuous optimization for large-scale structure from motion. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 35(12), 2841–2853 (2013)

    Article  Google Scholar 

  7. Cui, H., Shen, S., Gao, W., Hu, Z.: Efficient large-scale structure from motion by fusing auxiliary imaging information. IEEE Trans. Image Process. 24(11), 3561–3573 (2015)

    MathSciNet  Article  Google Scholar 

  8. Davison, A.J., Reid, I.D., Molton, N.D., Stasse, O.: MonoSLAM: real-time single camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 29(6), 1052–1067 (2007)

    Article  Google Scholar 

  9. DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (2018)

  10. Dong, J., Soatto, S.: Domain-size pooling in local descriptors: DSP-SIFT. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)

  11. Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. In: Advances in Neural Information Processing Systems (NIPS) (2016)

  12. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)

    MathSciNet  Article  Google Scholar 

  13. Gaiani, M., Remondino, F., Apollonio, F., Ballabeni, A.: An advanced pre-processing pipeline to improve automated photogrammetric reconstructions of architectural scenes. Remote Sens. 8(3), 178 (2016)

    Article  Google Scholar 

  14. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems (NIPS) (2014)

  15. Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge (2003)

    MATH  Google Scholar 

  16. Huang, X., Belongie, S.J.: Arbitrary style transfer in real-time with adaptive instance normalization. In: IEEE International Conference on Computer Vision (ICCV) (2017)

  17. Huang, X., Liu, M.Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: European Conference on Computer Vision (ECCV) (2018)

  18. Ignatov, A., Kobyshev, N., Timofte, R., Vanhoey, K., van Gool, L.: WESPE: Weakly supervised photo enhancer for digital cameras. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops (2018)

  19. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

  20. Johnson, J., Alahi, A., Li, F.F.: Perceptual losses for real-time style transfer and super-resolution. In: European Conference on Computer Vision (ECCV) (2016)

  21. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: International Conference on Learning Representations (ICLR) (2018)

  22. Kazemi, H., Iranmanesh, S.M., Nasrabadi, N.M.: Style and content disentanglement in generative adversarial networks. In: IEEE Winter Conference on Applications of Computer Vision (WACV) (2019)

  23. Kingma, D., Ba, J.: Adam: a method for stochastic optimization (2014)

  24. Klingner, B., Martin, D., Roseborough, J.: Street view motion-from-structure-from-motion. In: IEEE International Conference on Computer Vision (ICCV) (2013)

  25. Larsen, A.B.L., Sonderby, S.K., Winther, O.: Autoencoding beyond pixels using a learned similarity metric. In: International Conference on Machine Learning (ICML) (2016)

  26. Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., Shi, W.: Photo-realistic single image super-resolution using a generative adversarial network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

  27. Lee, H.Y., Tseng, H.Y., Huang, J.B., Singh, M., Yang, M.H.: Diverse image-to-image translation via disentangled representations. In: European Conference on Computer Vision (ECCV) (2018)

  28. Leutenegger, S., Chli, M., Siegwart, R.: BRISK: Binary robust invariant scalable keypoints. In: International Conference on Computer Vision (ICCV) (2011)

  29. Lhuillier, M.: Fusion of GPS and structure-from-motion using constrained bundle adjustments. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2011)

  30. Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: Advances in Neural Information Processing Systems (NIPS) (2017)

  31. Liu, M.Y., Tuzel, O.: Coupled generative adversarial networks. In: Advances in Neural Information Processing Systems (NIPS) (2016)

  32. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  33. Maddern, W., Pascoe, G., Linegar, C., Newman, P.: 1 Year, 1000 km: the Oxford RobotCar dataset. Int. J. Robot. Res. 36(1), 3–15 (2017)

    Article  Google Scholar 

  34. Mao, X., Li, Q., Xie, H., Lau, R.Y.K., Wang, Z.: Multi-class Generative Adversarial Networks with the L2 Loss Function. arXiv:1611.04076 (2016)

  35. Mirza, M., Osindero, S.: Conditional Generative Adversarial Nets. arXiv:1411.1784 (2014)

  36. Mur-Artal, R., Montiel, J.M.M., Tardos, J.D.: ORB-SLAM: A versatile and accurate monocular SLAM system. IEEE Trans. Robot. 31(5), 1147–1163 (2015)

    Article  Google Scholar 

  37. Naseer, T., Oliveira, G.L., Brox, T., Burgard, W.: Semantics-aware visual localization under challenging perceptual conditions. In: IEEE International Conference on Robotics and Automation (ICRA) (2017)

  38. Porav, H., Maddern, W., Newman, P.: Adversarial training for adverse conditions: robust metric localisation using appearance transfer. In: IEEE International Conference on Robotics and Automation (ICRA) (2018)

  39. Radford, A., Metz, L., Chintala, S.: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv:1511.06434 (2015)

  40. Riazuelo, L., Civera, J., Montiel, J.: C2TAM: A cloud framework for cooperative tracking and mapping. Robot. Auton. Syst. 62(4), 401–413 (2014)

    Article  Google Scholar 

  41. Rosca, M., Lakshminarayanan, B., Warde-Farley, D., Mohamed, S.: Variational Approaches for Auto-Encoding Generative Adversarial Networks. arXiv:1706.04987 (2017)

  42. Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: An efficient alternative to SIFT or SURF. In: International Conference on Computer Vision (ICCV) (2011)

  43. Sattler, T., Maddern, W., Toft, C., Torii, A., Hammarstrand, L., Stenborg, E., Safari, D., Okutomi, M., Pollefeys, M., Sivic, J., Kahl, F., Pajdla, T.: Benchmarking 6DOF outdoor visual localization in changing conditions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

  44. Schiffers, F., Yu, Z., Arguin, S., Maier, A., Ren, Q.: Synthetic fundus fluorescein angiography using deep neural networks. In: Bildverarbeitung für die Medizin. Springer Vieweg (2018)

  45. Schoenberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

  46. Schönberger, J.L., Frahm, J.M.: Structure-from-motion revisited. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

  47. Schönberger, J.L., Hardmeier, H., Sattler, T., Pollefeys, M.: Comparative Evaluation of Hand-Crafted and Learned Local Features. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

  48. Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556 [cs.CV] (2014)

  49. Snavely, N., Seitz, S.M., Szeliski, R.: Photo tourism: exploring photo collections in 3D. ACM Trans. Graph. 25(3), 835–846 (2006)

    Article  Google Scholar 

  50. Sünderhauf, N., Shirazi, S., Jacobson, A., Dayoub, F., Pepperell, E., Upcroft, B., Milford, M.: Place recognition with ConvNet landmarks: viewpoint-robust, condition-robust, training-free. In: Robotics: Science and Systems (RSS) (2015)

  51. Venator, M., Bruns, E., Maier, A.: Robust camera pose estimation for unordered road scene images in varying viewing conditions. IEEE Trans. Intell. Veh. 5(1), 165–174 (2019)

    Article  Google Scholar 

  52. Wallis, R.H.: An approach for the space variant restoration and enhancement of images. In: Symposium on Current Mathematical Problems in Image Science (1976)

  53. Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

  54. Wang, X., Gupta, A.: Generative image modeling using style and structure adversarial networks. In: European Conference on Computer Vision (ECCV) (2016)

  55. Widya, A.R., Torii, A., Okutomi, M.: Structure-from-motion using dense CNN features with keypoint relocalization. IPSJ Trans. Comput. Vis. Appl. 10(1) (2018)

  56. Wu, C.: Towards linear-time incremental structure from motion. In: International Conference on 3D Vision (3DV) (2013)

  57. Yi, K.M., Trulls, E., Lepetit, V., Fua, P.: LIFT: Learned invariant feature transform. In: European Conference on Computer Vision (ECCV) (2016)

  58. Ying, Z., Li, G., Ren, Y., Wang, R., Wang, W.: A new image contrast enhancement algorithm using exposure fusion framework. In: International Conference on Computer Analysis of Images and Patterns (CAIP) (2017)

  59. Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.N.: StackGAN++: Realistic image synthesis with stacked generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 41(8), 1947–1962 (2019)

    Article  Google Scholar 

  60. Zhao, J.J., Mathieu, M., LeCun, Y.: Energy-based Generative Adversarial Network. arXiv:1609.03126 (2016)

  61. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE International Conference on Computer Vision (ICCV) (2017)

  62. Zou, D., Tan, P.: CoSLAM: Collaborative visual SLAM in dynamic environments. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 35(2), 354–366 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Moritz Venator.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 30871 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Venator, M., Aklanoglu, S., Bruns, E. et al. Enhancing collaborative road scene reconstruction with unsupervised domain alignment. Machine Vision and Applications 32, 13 (2021). https://doi.org/10.1007/s00138-020-01144-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-020-01144-8

Keywords

  • Domain adaptation
  • Image-to-image translation
  • Structure from Motion
  • Scene reconstruction
  • Visual localization