SimAug: Learning Robust Representations from Simulation for Trajectory Prediction

Liang, Junwei; Jiang, Lu; Hauptmann, Alexander

doi:10.1007/978-3-030-58601-0_17

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12358))

Included in the following conference series:

European Conference on Computer Vision

3582 Accesses
37 Citations

Abstract

This paper studies the problem of predicting future trajectories of people in unseen cameras of novel scenarios and views. We approach this problem through the real-data-free setting in which the model is trained only on 3D simulation data and applied out-of-the-box to a wide variety of real cameras. We propose a novel approach to learn robust representation through augmenting the simulation training data such that the representation can better generalize to unseen real-world test data. The key idea is to mix the feature of the hardest camera view with the adversarial feature of the original view. We refer to our method as SimAug. We show that SimAug achieves promising results on three real-world benchmarks using zero real training data, and state-of-the-art performance in the Stanford Drone and the VIRAT/ActEV dataset when using in-domain training data. Code and models are released at https://next.cs.cmu.edu/simaug.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Relative Camera Pose Estimation using Synthetic Data with Domain Adaptation via Cycle-Consistent Adversarial Networks

Article Open access 08 July 2021

$$S^3$$ Net: Semantic-Aware Self-supervised Depth Estimation with Monocular Videos and Synthetic Data

An Adversarial Learned Trajectory Predictor with Knowledge-Rich Latent Variables

References

Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., Savarese, S.: Social LSTM: human trajectory prediction in crowded spaces. In: CVPR (2016)
Google Scholar
Awad, G., et al.: TRECVID 2018: benchmarking video activity detection, video captioning and matching, video storytelling linking and video search. In: TRECVID (2018)
Google Scholar
Bąk, S., Carr, P., Lalonde, J.-F.: Domain adaptation through synthesis for unsupervised person re-identification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 193–209. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_12
Chapter Google Scholar
Bansal, M., Krizhevsky, A., Ogale, A.: ChauffeurNet: learning to drive by imitating the best and synthesizing the worst. arXiv preprint arXiv:1812.03079 (2018)
Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., Krishnan, D.: Unsupervised pixel-level domain adaptation with generative adversarial networks. In: CVPR (2017)
Google Scholar
Caesar, H., et al.: nuScenes: a multimodal dataset for autonomous driving. arXiv preprint arXiv:1903.11027 (2019)
Chai, Y., Sapp, B., Bansal, M., Anguelov, D.: MultiPath: multiple probabilistic anchor trajectory hypotheses for behavior prediction. arXiv preprint arXiv:1910.05449 (2019)
Chang, M.F., et al.: Argoverse: 3D tracking and forecasting with rich maps. In: CVPR (2019)
Google Scholar
Chen, H., et al.: Data-free learning of student networks. In: ICCV (2019)
Google Scholar
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2017)
Article Google Scholar
Cheng, Y., Jiang, L., Macherey, W.: Robust neural machine translation with doubly adversarial inputs. In: ACL (2019)
Google Scholar
Cheng, Y., Jiang, L., Macherey, W., Eisenstein, J.: AdvAug: robust data augmentation for neural machine translation. In: ACL (2020)
Google Scholar
Deo, N., Trivedi, M.M.: Trajectory forecasts in unknown environments conditioned on grid-based plans. arXiv preprint arXiv:2001.00735 (2020)
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., Koltun, V.: CARLA: an open urban driving simulator. arXiv preprint arXiv:1711.03938 (2017)
Gaidon, A., Wang, Q., Cabon, Y., Vig, E.: Virtual worlds as proxy for multi-object tracking analysis. In: CVPR (2016)
Google Scholar
Ganin, Y., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 2096-2030 (2016)
MathSciNet Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: NeurIPS (2014)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
Gupta, A., Johnson, J., Savarese, S., Fei-Fei, L., Alahi, A.: Social GAN: socially acceptable trajectories with generative adversarial networks. In: CVPR (2018)
Google Scholar
Heess, N., et al.: Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286 (2017)
Hong, J., Sapp, B., Philbin, J.: Rules of the road: predicting driving behavior with a convolutional model of semantic interactions. In: CVPR (2019)
Google Scholar
Jiang, L., Huang, D., Liu, M., Yang, W.: Beyond synthetic noise: deep learning on controlled noisy labels. In: ICML (2020)
Google Scholar
Jiang, L., Meng, D., Zhao, Q., Shan, S., Hauptmann, A.G.: Self-paced curriculum learning. In: AAAI (2015)
Google Scholar
Jiang, L., Zhou, Z., Leung, T., Li, L.J., Fei-Fei, L.: MentorNet: learning data-driven curriculum for very deep neural networks on corrupted labels. In: ICML (2018)
Google Scholar
Kang, G., Jiang, L., Yang, Y., Hauptmann, A.G.: Contrastive adaptation network for unsupervised domain adaptation. In: CVPR (2019)
Google Scholar
Kar, A., et al.: Meta-Sim: learning to generate synthetic datasets. In: ICCV (2019)
Google Scholar
Kitani, K.M., Ziebart, B.D., Bagnell, J.A., Hebert, M.: Activity forecasting. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 201–214. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33765-9_15
Chapter Google Scholar
Kooij, J.F.P., Schneider, N., Flohr, F., Gavrila, D.M.: Context-based pedestrian path prediction. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 618–633. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_40
Chapter Google Scholar
Kurakin, A., Goodfellow, I., Bengio, S.: Adversarial examples in the physical world. In: ICLR (2017)
Google Scholar
Lambert, J., Sener, O., Savarese, S.: Deep learning under privileged information using heteroscedastic dropout. In: CVPR (2018)
Google Scholar
Lee, N., Choi, W., Vernaza, P., Choy, C.B., Torr, P.H., Chandraker, M.: DESIRE: distant future prediction in dynamic scenes with interacting agents. In: CVPR (2017)
Google Scholar
Lerner, A., Chrysanthou, Y., Lischinski, D.: Crowds by example. In: Computer Graphics Forum, pp. 655–664. Wiley Online Library (2007)
Google Scholar
Li, Y.: Which way are you going? Imitative decision learning for path forecasting in dynamic scenes. In: CVPR (2019)
Google Scholar
Liang, J., et al.: An event reconstruction tool for conflict monitoring using social media. In: AAAI (2017)
Google Scholar
Liang, J., Jiang, L., Cao, L., Kalantidis, Y., Li, L.J., Hauptmann, A.G.: Focal visual-text attention for memex question answering. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1893–1908 (2019)
Article Google Scholar
Liang, J., Jiang, L., Meng, D., Hauptmann, A.G.: Learning to detect concepts from webly-labeled video data. In: IJCAI (2016)
Google Scholar
Liang, J., Jiang, L., Murphy, K., Yu, T., Hauptmann, A.: The garden of forking paths: towards multi-future trajectory prediction. In: CVPR (2020)
Google Scholar
Liang, J., Jiang, L., Niebles, J.C., Hauptmann, A.G., Fei-Fei, L.: Peeking into the future: predicting future person activities and locations in videos. In: CVPR (2019)
Google Scholar
Lopez-Paz, D., Bottou, L., Schölkopf, B., Vapnik, V.: Unifying distillation and privileged information. arXiv preprint arXiv:1511.03643 (2015)
Luber, M., Stork, J.A., Tipaldi, G.D., Arras, K.O.: People tracking with human motion predictions from social forces. In: ICRA (2010)
Google Scholar
Luo, Z., Hsieh, J.-T., Jiang, L., Niebles, J.C., Fei-Fei, L.: Graph distillation for action detection with privileged modalities. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 174–192. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_11
Ma, W.C., Huang, D.A., Lee, N., Kitani, K.M.: Forecasting interactive dynamics of pedestrians with fictitious play. In: CVPR (2017)
Google Scholar
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
Makansi, O., Ilg, E., Cicek, O., Brox, T.: Overcoming limitations of mixture density networks: a sampling and fitting framework for multimodal future prediction. In: CVPR (2019)
Google Scholar
Mangalam, K., Adeli, E., Lee, K.H., Gaidon, A., Niebles, J.C.: Disentangling human dynamics for pedestrian locomotion forecasting with noisy supervision. arXiv preprint arXiv:1911.01138 (2019)
Northcutt, C.G., Jiang, L., Chuang, I.L.: Confident learning: estimating uncertainty in dataset labels. arXiv preprint arXiv:1911.00068 (2019)
Oh, S., et al.: A large-scale benchmark dataset for event recognition in surveillance video. In: CVPR (2011)
Google Scholar
Qiu, W., et al.: UnrealCV: virtual worlds for computer vision. In: ACM Multimedia (2017)
Google Scholar
Ren, M., Zeng, W., Yang, B., Urtasun, R.: Learning to reweight examples for robust deep learning. In: ICML (2018)
Google Scholar
Rhinehart, N., Kitani, K.M.: First-person activity forecasting with online inverse reinforcement learning. In: ICCV (2017)
Google Scholar
Rhinehart, N., Kitani, K.M., Vernaza, P.: r2p2: a reparameterized pushforward policy for diverse, precise generative path forecasting. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 794–811. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_47
Chapter Google Scholar
Richter, S.R., Vineet, V., Roth, S., Koltun, V.: Playing for data: ground truth from computer games. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 102–118. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_7
Chapter Google Scholar
Robicquet, A., Sadeghian, A., Alahi, A., Savarese, S.: Learning social etiquette: human trajectory understanding in crowded scenes. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 549–565. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_33
Chapter Google Scholar
Ros, G., Sellart, L., Materzynska, J., Vazquez, D., Lopez, A.M.: The synthia dataset: a large collection of synthetic images for semantic segmentation of urban scenes. In: CVPR (2016)
Google Scholar
Ruiz, N., Schulter, S., Chandraker, M.: Learning to simulate. arXiv preprint arXiv:1810.02513 (2018)
Sadeghian, A., Kosaraju, V., Sadeghian, A., Hirose, N., Savarese, S.: SoPhie: an attentive GAN for predicting paths compliant to social and physical constraints. arXiv preprint arXiv:1806.01482 (2018)
Sadeghian, A., Legros, F., Voisin, M., Vesel, R., Alahi, A., Savarese, S.: CAR-Net: clairvoyant attentive recurrent network. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 162–180. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_10
Chapter Google Scholar
Shah, S., Dey, D., Lovett, C., Kapoor, A.: AirSim: high-fidelity visual and physical simulation for autonomous vehicles. In: Hutter, M., Siegwart, R. (eds.) Field and Service Robotics. SPAR, vol. 5, pp. 621–635. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-67361-5_40
Chapter Google Scholar
Souza, C.R., Gaidon, A., Cabon, Y., López, A.M.: Procedural generation of videos to train deep action recognition networks. In: CVPR (2017)
Google Scholar
Styles, O., Ross, A., Sanchez, V.: Forecasting pedestrian trajectory with machine-annotated training data. In: 2019 IEEE Intelligent Vehicles Symposium (IV), pp. 716–721. IEEE (2019)
Google Scholar
Styles, O., Guha, T., Sanchez, V.: Multiple object forecasting: predicting future object locations in diverse environments. arXiv preprint arXiv:1909.11944 (2019)
Sun, C., Karlsson, P., Wu, J., Tenenbaum, J.B., Murphy, K.: Stochastic prediction of multi-agent interactions from partial observations. arXiv preprint arXiv:1902.09641 (2019)
Sun, S.-H., Huh, M., Liao, Y.-H., Zhang, N., Lim, J.J.: Multi-view to novel view: synthesizing novel views with self-learned confidence. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 162–178. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_10
Chapter Google Scholar
Tramèr, F., Kurakin, A., Papernot, N., Goodfellow, I., Boneh, D., McDaniel, P.: Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204 (2017)
Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: CVPR (2017)
Google Scholar
Vapnik, V., Izmailov, R.: Learning using privileged information: similarity control and knowledge transfer. J. Mach. Learn. Res. 16(2023–2049), 2 (2015)
MathSciNet MATH Google Scholar
Varol, G., Laptev, I., Schmid, C., Zisserman, A.: Synthetic humans for action recognition from unseen viewpoints. arXiv preprint arXiv:1912.04070 (2019)
Wang, Y., Jiang, L., Yang, M.H., Li, L.J., Long, M., Fei-Fei, L.: Eidetic 3D LSTM: a model for video prediction and beyond. In: ICLR (2019)
Google Scholar
Wu, Y., Jiang, L., Yang, Y.: Revisiting embodiedqa: a simple baseline and beyond. IEEE Trans. Image Process. 29, 3984–3992 (2020)
Article Google Scholar
Xie, C., Wu, Y., van der Maaten, L., Yuille, A.L., He, K.: Feature denoising for improving adversarial robustness. In: CVPR (2019)
Google Scholar
Xingjian, S., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.C.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: NeurIPS (2015)
Google Scholar
Xue, H., Huynh, D.Q., Reynolds, M.: SS-LSTM: a hierarchical LSTM model for pedestrian trajectory prediction. In: WACV (2018)
Google Scholar
Yagi, T., Mangalam, K., Yonetani, R., Sato, Y.: Future person localization in first-person videos. In: CVPR (2018)
Google Scholar
Yu, F., et al.: BDD100K: a diverse driving video database with scalable annotation tooling. arXiv preprint arXiv:1805.04687 (2018)
Zeiler, M.D.: ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)
Zeng, X., et al.: Adversarial attacks beyond the image space. In: CVPR (2019)
Google Scholar
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: Beyond empirical risk minimization. In: ICLR (2018)
Google Scholar
Zhang, P., Ouyang, W., Zhang, P., Xue, J., Zheng, N.: SR-LSTM: state refinement for LSTM towards pedestrian trajectory prediction. In: CVPR (2019)
Google Scholar
Zhang, Y., Wei, X., Qiu, W., Xiao, Z., Hager, G.D., Yuille, A.: RSA: randomized simulation as augmentation for robust human action recognition. arXiv preprint arXiv:1912.01180 (2019)
Zhang, Y., Gibson, G.M., Hay, R., Bowman, R.W., Padgett, M.J., Edgar, M.P.: A fast 3D reconstruction system with a low-cost camera accessory. Sci. Rep. 5, 10909 (2015)
Article Google Scholar
Zhao, T., et al.: Multi-agent tensor fusion for contextual trajectory prediction. In: CVPR (2019)
Google Scholar
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ADE20K dataset. In: CVPR (2017)
Google Scholar
Zhu, Y., et al.: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: ICRA (2017)
Google Scholar

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their useful comments, and Google Cloud for providing GCP research credits. This research was supported by NSF grant IIS-1650994, the financial assistance award 60NANB17D156 from NIST, and the Baidu Scholarship. This work was also supported by IARPA via DOI/IBC contract number D17PC00340. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA, NIST, DOI/IBC, the National Science Foundation, or the U.S. Government.

Author information

Authors and Affiliations

Carnegie Mellon University, Pittsburgh, USA
Junwei Liang & Alexander Hauptmann
Google Research, Mountain View, USA
Lu Jiang

Authors

Junwei Liang
View author publications
You can also search for this author in PubMed Google Scholar
Lu Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Alexander Hauptmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Junwei Liang .

Editor information

Editors and Affiliations

University of Oxford, Oxford, UK
Andrea Vedaldi
Graz University of Technology, Graz, Austria
Horst Bischof
University of Freiburg, Freiburg im Breisgau, Germany
Thomas Brox
University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
Jan-Michael Frahm

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liang, J., Jiang, L., Hauptmann, A. (2020). SimAug: Learning Robust Representations from Simulation for Trajectory Prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision – ECCV 2020. ECCV 2020. Lecture Notes in Computer Science(), vol 12358. Springer, Cham. https://doi.org/10.1007/978-3-030-58601-0_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-58601-0_17
Published: 28 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58600-3
Online ISBN: 978-3-030-58601-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SimAug: Learning Robust Representations from Simulation for Trajectory Prediction

Abstract

Access this chapter

Similar content being viewed by others

Relative Camera Pose Estimation using Synthetic Data with Domain Adaptation via Cycle-Consistent Adversarial Networks

$$S^3$$ Net: Semantic-Aware Self-supervised Depth Estimation with Monocular Videos and Synthetic Data

An Adversarial Learned Trajectory Predictor with Knowledge-Rich Latent Variables

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

SimAug: Learning Robust Representations from Simulation for Trajectory Prediction

Abstract

Access this chapter

Similar content being viewed by others

Relative Camera Pose Estimation using Synthetic Data with Domain Adaptation via Cycle-Consistent Adversarial Networks

$$S^3$$ Net: Semantic-Aware Self-supervised Depth Estimation with Monocular Videos and Synthetic Data

An Adversarial Learned Trajectory Predictor with Knowledge-Rich Latent Variables

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation