Skip to main content
Log in

Cut-in maneuver detection with self-supervised contrastive video representation learning

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

The detection of the maneuvers of the surrounding vehicles is important for autonomous vehicles to act accordingly to avoid possible accidents. This study proposes a framework based on contrastive representation learning to detect potentially dangerous cut-in maneuvers that can happen in front of the ego vehicle. First, the encoder network is trained in a self-supervised fashion with contrastive loss where two augmented videos of the same video clip stay close to each other in the embedding space, while augmentations from different videos stay far apart. Since no maneuver labeling is required in this step, a relatively large dataset can be used. After this self-supervised training, the encoder is fine-tuned with our cut-in/lane-pass labeled datasets. Instead of using original video frames, we simplified the scene by highlighting surrounding vehicles and ego-lane. We have investigated the use of several classification heads, augmentation types, and scene simplification alternatives. The most successful model outperforms the best fully supervised model by \(\sim \)2% with an accuracy of 92.52%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data availability

The labeled and unlabeled simplified scene representation data is publicly available at Github.

References

  1. Insurance Information Institute, Facts + Statistics: Highway safety. Accessed: 2022-06-28

  2. Deo, N., Rangesh, A., Trivedi, M.M.: How would surround vehicles move? a unified framework for maneuver classification and motion prediction. IEEE Trans. Intell. Veh. 3(2), 129–140 (2018)

    Article  Google Scholar 

  3. Jeong, Y., Yi, K.: Bidirectional long short-term memory-based interactive motion prediction of cut-in vehicles in urban environments. IEEE Access 8, 106183–106197 (2020)

    Article  Google Scholar 

  4. Chen, Y., Hu, C., Wang, J.: Human-centered trajectory tracking control for autonomous vehicles with driver cut-in behavior prediction. IEEE Trans. Veh. Technol. 68(9), 8461–8471 (2019)

    Article  Google Scholar 

  5. Yoon, Y., Kim, C., Lee, J., Yi, K.: Interaction-aware probabilistic trajectory prediction of cut-in vehicles using gaussian process for proactive control of autonomous vehicles. IEEE Access 9, 63440–63455 (2021)

    Article  Google Scholar 

  6. Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F., Madhavan, V., Darrell, T.: Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: CVPR (2020)

  7. Izquierdo, R., Quintanar, A., Parra, I., Fernández-Llorca, D., Sotelo, M.: The prevention dataset: A novel benchmark for prediction of vehicles intentions. In: ITSC (2019)

  8. Altché, F., de La Fortelle, A.: An lstm network for highway trajectory prediction. In: ITSC (2017)

  9. Scheel, O., Nagaraja, N.S., Schwarz, L., Navab, N., Tombari, F.: Attention-based lane change prediction. In: ICRA (2019)

  10. Brosowsky, M., Orschau, P., Dünkel, O., Elspas, P., Slieter, D., Zöllner, M.: Joint vehicle trajectory and cut-in prediction on highways using output constrained neural networks. In: 2021 IEEE Symposium Series on Computational Intelligence (SSCI) (2021)

  11. Izquierdo, R., Quintanar, A., Parra, I., Fernández-Llorca, D., Sotelo, M.: Experimental validation of lane-change intention prediction methodologies based on CNN and LSTM. In: ITSC (2019)

  12. Biparva, M., Fernández-Llorca, D., Izquierdo-Gonzalo, R., Tsotsos, J.K.: Video action recognition for lane-change classification and prediction of surrounding vehicles. Preprint at arXiv:2101.05043 (2021)

  13. Fernández-Llorca, D., Biparva, M., Izquierdo-Gonzalo, R., Tsotsos, J.K.: Two-stream networks for lane-change prediction of surrounding vehicles. In: ITSC (2020)

  14. Qian, R., Meng, T., Gong, B., Yang, M.-H., Wang, H., Belongie, S., Cui, Y.: Spatiotemporal contrastive video representation learning. In: CVPR (2021)

  15. Bastanlar, Y., Orhan, S.: Self-supervised contrastive representation learning in computer vision. In: Artificial Intelligence - Annual Volume 2022, IntechOpen. (2022)

  16. Le-Khac, P.H., Healy, G., Smeaton, A.F.: Contrastive representation learning: a framework and review. Ieee Access 8, 193907–193934 (2020)

    Article  Google Scholar 

  17. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR (2020)

  18. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML (2020)

  19. Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P.H., Buchatskaya, E., Doersch, C., Pires, B.A., Guo, Z.D., Azar, M.G., Piot, B., Kavukcuoglu, K., Munos, R., Valko, M.: Bootstrap your own latent: a new approach to self-supervised learning. Adv. Neural. Inf. Process. Syst. 33, 21271–21284 (2020)

    Google Scholar 

  20. Chen, X., He, K.: Exploring simple siamese representation learning. In: CVPR (2021)

  21. Tao, L., Wang, X., Yamasaki, T.: Self-supervised video representation learning using inter-intra contrastive framework. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 2193–2201 (2020)

  22. Han, T., Xie, W., Zisserman, A.: Self-supervised co-training for video representation learning. Adv. Neural. Inf. Process. Syst. 33, 5679–5690 (2020)

    Google Scholar 

  23. Lin, Y., Guo, X., Lu, Y.: Self-supervised video representation learning with meta-contrastive network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8239–8249 (2021)

  24. Knights, J., Harwood, B., Ward, D., Vanderkop, A., Mackenzie-Ross, O., Moghadam, P.: Temporally coherent embeddings for self-supervised video representation learning. In: 2020 25th International Conference on Pattern Recognition (ICPR), IEEE. pp. 8914–8921 (2021)

  25. Wu, Y., Kirillov, A., Massa, F., Lo, W.-Y., Girshick, R.: Detectron2. https://github.com/facebookresearch/detectron2 (2019)

  26. Oord, A.v.d., Li, Y., Vinyals, O.: Representation Learning with Contrastive Predictive Coding. Preprint at arXiv:1807.03748 (2018)

  27. Stamoulakatos, A., Cardona, J., Michie, C., Andonovic, I., Lazaridis, P., Bellekens, X., Atkinson, R., Hossain, M.M., Tachtatzis, C.: A comparison of the performance of 2d and 3d convolutional neural networks for subsea survey video classification. In: OCEANS 2021: San Diego–Porto, pp. 1–10 (2021)

  28. Hara, K., Kataoka, H., Satoh, Y.: Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet? In: CVPR (2018)

  29. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  30. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)

Download references

Funding

Y.Nalcakan is supported by the Scientific and Technological Research Council of Turkey (TUBITAK) 2244 Scholarship, Grant No: 2244-118C079. The numerical calculations reported in this paper were fully performed at TUBITAK ULAKBIM, High Performance and Grid Computing Center (TRUBA resources).

Author information

Authors and Affiliations

Authors

Contributions

Y.Nalcakan prepared all of the data sets and machine learning method codes and performed the experiments. Both authors wrote the manuscript, prepared the figures, and designed the detailed steps of the work. We confirm that the manuscript has been read and approved by both authors and that there are no other persons who satisfied the criteria for authorship but are not listed. We further confirm that the order of authors listed in the manuscript has been approved by all of us.

Corresponding author

Correspondence to Yagiz Nalcakan.

Ethics declarations

Conflict of interest

We wish to confirm that there are no known conflicts of interest associated with this publication.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nalcakan, Y., Bastanlar, Y. Cut-in maneuver detection with self-supervised contrastive video representation learning. SIViP 17, 2915–2923 (2023). https://doi.org/10.1007/s11760-023-02512-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-023-02512-3

Keywords

Navigation