A Study of Using Synthetic Data for Effective Association Knowledge Learning

Liu, Yuchi; Wang, Zhongdao; Zhou, Xiangxin; Zheng, Liang

doi:10.1007/s11633-022-1380-x

A Study of Using Synthetic Data for Effective Association Knowledge Learning

Research Article
Open access
Published: 08 March 2023

Volume 20, pages 194–206, (2023)
Cite this article

Download PDF

You have full access to this open access article

Machine Intelligence Research Aims and scope Submit manuscript

A Study of Using Synthetic Data for Effective Association Knowledge Learning

Download PDF

487 Accesses
1 Citation
26 Altmetric
3 Mentions
Explore all metrics

Abstract

Association, aiming to link bounding boxes of the same identity in a video sequence, is a central component in multi-object tracking (MOT). To train association modules, e.g., parametric networks, real video data are usually used. However, annotating person tracks in consecutive video frames is expensive, and such real data, due to its inflexibility, offer us limited opportunities to evaluate the system performance w.r.t. changing tracking scenarios. In this paper, we study whether 3D synthetic data can replace real-world videos for association training. Specifically, we introduce a large-scale synthetic data engine named MOTX, where the motion characteristics of cameras and objects are manually configured to be similar to those of real-world datasets. We show that, compared with real data, association knowledge obtained from synthetic data can achieve very similar performance on real-world test sets without domain adaption techniques. Our intriguing observation is credited to two factors. First and foremost, 3D engines can well simulate motion factors such as camera movement, camera view, and object movement so that the simulated videos can provide association modules with effective motion features. Second, the experimental results show that the appearance domain gap hardly harms the learning of association knowledge. In addition, the strong customization ability of MOTX allows us to quantitatively assess the impact of motion factors on MOT, which brings new insights to the community.

Article PDF

Self-supervised Multi-object Tracking with Cycle-Consistency

MOTR: End-to-End Multiple-Object Tracking with Transformer

A Two-Stage Minimum Cost Multicut Approach to Self-supervised Multiple Person Tracking

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

G. Brasó, L. Leal-Taixé. Learning a neural solver for multiple object tracking. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 6246–6256, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.00628.
Google Scholar
Y. H. Xu, A. Ŝep, Y. T. Ban, R. Horaud, L. Leal-Taixé, X. Alameda-Pineda. How to train your deep multi-object tracker. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Seattle, USA, pp. 6786–6795, 2020. DOI: https://doi.org/10.1109/CVPR42600.2020.00682.
Google Scholar
L. Leal-Taixé, A. Milan, I. Reid, S. Roth, K. Schindler. MOTChallenge 2015: Towards a benchmark for multi-target tracking. [Online], Available: https://arxiv.org/abs/1504.01942, 2015.
A. Milan, L. Leal-Taixé, I. Reid, S. Roth, K. Schindler. MOT16: A benchmark for multi-object tracking. [Online], Available: https://arxiv.org/abs/1603.00831, 2016.
S. Bąk, P. Carr, J. F. Lalonde. Domain adaptation through synthesis for unsupervised person re-identification. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 193–209, 2018. DOI: https://doi.org/10.1007/978-3-030-01261-8_12.
Google Scholar
H. Z. Dou, W. H. Zhang, P. Z. Zhang, Y. H. Zhao, S. Y. Li, Z. Q. Qin, F. Wu, L. Dong, X. Li. VersatileGait: A large-scale synthetic gait dataset with fine-grained attributes and complicated scenarios. [Online], Available: https://arxiv.org/abs/2101.01394, 2021.
Z. F. Xue, W. J. Mao, L. Zheng. Learning to simulate complex scenes. [Online], Available: https://arxiv.org/abs/2006.14611, 2020.
Y. Yao, L. Zheng, X. D. Yang, M. Naphade, T. Gedeon. Simulating content consistent vehicle datasets with attribute descent. In Proceedings of the 16th European Conference on Computer Vision, Springer, Glasgow, UK, pp. 775–791, 2020. DOI: https://doi.org/10.1007/978-3-030-58539-6_46.
Google Scholar
J. H. Li, X. Gao, T. T. Jiang. Graph networks for multiple object tracking. In Proceedings of IEEE Winter Conference on Applications of Computer Vision, Snowmass, USA, pp. 708–717, 2020. DOI: https://doi.org/10.1109/WACV45572.2020.9093347.
Z. D. Wang, L. Zheng, Y. X. Liu, Y. L. Li, S. J. Wang. Towards real-time multi-object tracking. In Proceedings of the 16th European Conference on Computer Vision, Springer, Glasgow, UK, pp. 107–122, 2020. DOI: https://doi.org/10.1007/978-3-030-58621-8_7.
Google Scholar
N. Wojke, A. Bewley, D. Paulus. Simple online and real-time tracking with a deep association metric. In Proceedings of IEEE International Conference on Image Processing, Beijing, China, pp. 3645–3649, 2017. DOI: https://doi.org/10.1109/ICIP.2017.8296962.
Y. F. Zhan, C. Y. Wang, X. G. Wang, W. J. Zeng, W. Y. Liu. A simple baseline for multi-object tracking. [Online], Available: https://arxiv.org/abs/2004.01888v1, 2020.
Z. W. Zhou, J. L. Xing, M. D. Zhang, W. M. Hu. Online multi-target tracking with tensor-based high-order graph matching. In Proceedings of the 24th International Conference on Pattern Recognition, IEEE, Beijing, China, pp. 1809–1814, 2018. DOI: https://doi.org/10.1109/ICPR.2018.8545450.
Google Scholar
J. Zhu, H. Yang, N. Liu, M. Kim, W. J. Zhang, M. H. Yang. Online multi-object tracking with dual matching attention networks. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 379–396, 2018. DOI: https://doi.org/10.1007/978-3-030-01228-1_23.
Google Scholar
Q. C. Wang, Y. H. Gong, C. H. Yang, C. H. Li. Robust object tracking under appearance change conditions. International Journal of Automation and Computing, vol. 7, no. 1, pp. 31–38, 2010. DOI: https://doi.org/10.1007/s11633-010-0031-9.
Article Google Scholar
H. W. Kuhn. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, vol. 2, no. 1–2, pp. 83–97, 1955. DOI: https://doi.org/10.1002/nav.3800020109.
Article MathSciNet Google Scholar
G. Welch, G. Bishop. An Introduction to the Kalman Filter. University of North Carolina at Chapel Hill, Chapel Hill, USA, 1995.
Google Scholar
I. Papakis, A. Sarkar, A. Karpatne. GCNNMatch: Graph convolutional neural networks for multi-object tracking via Sinkhorn normalization. [Online], Available: https://arxiv.org/abs/2010.00067, 2020.
X. C. Peng, B. Usman, N. Kaushik, J. Hoffman, D. Q. Wang, K. Saenko. VisDA: The visual domain adaptation challenge. [Online], Available: https://arxiv.org/abs/1710.06924, 2017.
X. C. Peng, B. Usman, N. Kaushik, D. Q. Wang, J. Hoffman, K. Saenko. VisDA: A synthetic-to-real benchmark for visual domain adaptation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, IEEE, Salt Lake City, USA, pp. 2021–2026, 2018. DOI: https://doi.org/10.1109/CVPRW.2018.00271.
Google Scholar
Y. Cabon, N. Murray, M. Humenberger. Virtual KITTI 2. [Online], Available: https://arxiv.org/abs/2001.10773, 2020.
A. Gaidon, Q. Wang, Y. Cabon, E. Vig. Virtual Worlds as proxy for multi-object tracking analysis. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 4340–4349, 2016. DOI: https://doi.org/10.1109/CVPR.2016.470.
Y. Z. Hou, L. Zheng, S. Gould. Multiview detection with feature perspective transformation. In Proceedings of the 16th European Conference on Computer Vision, Springer, Glasgow, UK, pp. 1–18, 2020. DOI: https://doi.org/10.1007/978-3-030-58571-6_1.
Google Scholar
M. Fabbri, F. Lanzi, S. Calderara, A. Palazzi, R. Vezzani, R. Cucchiara. Learning to detect and track visible and occluded body joints in a virtual world. In Proceedings of the 15th European Conference on Computer Vision, Springer, Munich, Germany, pp. 450–456, 2018. DOI: https://doi.org/10.1007/978-3-030-01225-0_27.
Google Scholar
S. Sankaranarayanan, Y. Balaji, A. Jain, S. Nam Lim, R. Chellappa. Learning from synthetic data: Addressing domain shift for semantic segmentation. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 3752–3761, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00395.
Google Scholar
C. Doersch, A. Zisserman. Sim2real transfer learning for 3D human pose estimation: Motion to the rescue. In Proceedings of the 33rd Conference on Neural Information Processing Systems, Vancouver, Canada, 2019.
E. Kolve, R. Mottaghi, W. Han, E. VanderBilt, L. Weihs, A. Herrasti, M. Deitke, K. Ehsani, D. Gordon, Y. K. Zhu, A. Kembhavi, A. Gupta, A. Farhadi. AI2-THOR: An interactive 3D environment for visual AI. [Online], Available: https://arxiv.org/abs/1712.05474, 2017.
A. Kar, A. Prakash, M. Y. Liu, E. Cameracci, J. Yuan, M. Rusiniak, D. Acuna, A. Torralba, S. Fidler. Meta-Sim: Learning to generate synthetic datasets. In Proceedings of IEEE/CVF International Conference on Computer Vision, IEEE, Seoul, Republic of Korea, pp. 4550–4559, 2019. DOI: https://doi.org/10.1109/ICCV.2019.00465.
Google Scholar
A. Juliani, V. P. Berges, E. Teng, A. Cohen, J. Harper, C. Elion, C. Goy, Y. Gao, H. Henry, M. Mattar, D. Lange. Unity: A general platform for intelligent agents. [Online], Available: https://arxiv.org/abs/1809.02627, 2018.
X. X. Sun, L. Zheng. Dissecting person re-identification from the viewpoint of viewpoint. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Long Beach, USA, pp. 608–617, 2019. DOI: https://doi.org/10.1109/CVPR.2019.00070.
Google Scholar
F. Scarselli, M. Gori, A. C. Tsoi, M. Hagenbuchner, G. Monfardini. The graph neural network model. IEEE Transactions on Neural Networks, vol. 20, no. 1, pp. 61–80, 2009. DOI: https://doi.org/10.1109/TNN.2008.2005605.
Article Google Scholar
A. Bewley, Z. Y. Ge, L. Ott, F. Ramos, B. Upcroft. Simple online and realtime tracking. In Proceedings of IEEE International Conference on Image Processing, Phoenix, USA, pp. 3464–3468, 2016. DOI: https://doi.org/10.1109/ICIP.2016.7533003.
Y. H. Du, Y. Song, B. Yang, Y. Y. Zhao. StrongSORT: Make deepSORT great again. [Online], Available: https://arxiv.org/abs/2202.13514, 2022.
K. Bernardin, R. Stiefelhagen. Evaluating multiple object tracking performance: The clear mot metrics. EURASIP Journal on Image and Video Processing, vol. 2008, Article number 246309, 2008.
P. Dendorfer, H. Rezatofighi, A. Milan, J. Shi, D. Cremers, I. Reid, S. Roth, K. Schindler, L. Leal-Taixé. MOT20: A benchmark for multi object tracking in crowded scenes. [Online], Available: https://arxiv.org/abs/2003.09003. 2020.
W. J. Deng, L. Zheng, Q. X. Ye, G. L. Kang, Y. Yang, J. B. Jiao. Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification. In Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Salt Lake City, USA, pp. 994–1003, 2018. DOI: https://doi.org/10.1109/CVPR.2018.00110.
Google Scholar

Download references

Acknowledgements

This work was supported by the ARC Discovery Early Career Researcher Award, China (No. DE200101283) and the ARC Discovery Project, China (No. DP210102801).

Author information

Authors and Affiliations

College of Engineering & Computer Science, Australian National University, Canberra, 2601, Australia
Yuchi Liu & Liang Zheng
Department of Electronic Engineering, Tsinghua University, Beijing, 100084, China
Zhongdao Wang & Xiangxin Zhou

Authors

Yuchi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhongdao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiangxin Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Liang Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Liang Zheng.

Additional information

Declarations

Competing interests. The authors have no competing interests to declare that are relevant to the content of this article.

Yuchi Liu received the B.Eng. degree in software engineering from Australian National University, Australia in 2018. He is currently a Ph. D. degree candidate in computer science at Australian National University, Australia.

His research interests include video object tracking, learning from synthetic data, and weakly supervised learning.

Zhongdao Wang received the B. Sc. degree in physics from Department of Physics Tsinghua University, China in 2017. He is currently a Ph. D. degree candidate in electronic engineering at Department of Electronic Engineering, Tsinghua University, China.

His research interests include perception algorithms for autonomous driving, including but not limited to 3D object detection/tracking, network architecture/learning algorithm/pre-training for multimodal fusion, and 4D Auto-labeling.

Xiangxin Zhou received the B. Sc. degree in electronic engineering from Department of Electronic Engineering, Tsinghua University, China in 2021. He is currently a Ph. D. degree candidate in artificial intelligence at School of Artificial Intelligence, University of Chinese Academy of Sciences (UCAS), China, and Institute of Automation, Chinese Academy of Sciences (CASIA), China.

His research interests include geometric deep learning, graph neural networks for drug design, causal inference, and multimodal machine learning.

Liang Zheng received the B. Eng. degree in life science from Tsinghua University, China in 2010, and the Ph. D. degree in electronic engineering from Tsinghua University, China in 2015. He is a lecturer and a computer science futures fellowship in School of Computer Science, Australian National University, Australia.

His research interests include computer vision, machine learning, object re-identification and dataset-centered vision.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Liu, Y., Wang, Z., Zhou, X. et al. A Study of Using Synthetic Data for Effective Association Knowledge Learning. Mach. Intell. Res. 20, 194–206 (2023). https://doi.org/10.1007/s11633-022-1380-x

Download citation

Received: 19 July 2022
Accepted: 13 October 2022
Published: 08 March 2023
Issue Date: April 2023
DOI: https://doi.org/10.1007/s11633-022-1380-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Study of Using Synthetic Data for Effective Association Knowledge Learning

Abstract

Article PDF

Similar content being viewed by others

Self-supervised Multi-object Tracking with Cycle-Consistency

MOTR: End-to-End Multiple-Object Tracking with Transformer

A Two-Stage Minimum Cost Multicut Approach to Self-supervised Multiple Person Tracking

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Declarations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Study of Using Synthetic Data for Effective Association Knowledge Learning

Abstract

Article PDF

Similar content being viewed by others

Self-supervised Multi-object Tracking with Cycle-Consistency

MOTR: End-to-End Multiple-Object Tracking with Transformer

A Two-Stage Minimum Cost Multicut Approach to Self-supervised Multiple Person Tracking

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Declarations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation