Abstract
The detection of the maneuvers of the surrounding vehicles is important for autonomous vehicles to act accordingly to avoid possible accidents. This study proposes a framework based on contrastive representation learning to detect potentially dangerous cut-in maneuvers that can happen in front of the ego vehicle. First, the encoder network is trained in a self-supervised fashion with contrastive loss where two augmented videos of the same video clip stay close to each other in the embedding space, while augmentations from different videos stay far apart. Since no maneuver labeling is required in this step, a relatively large dataset can be used. After this self-supervised training, the encoder is fine-tuned with our cut-in/lane-pass labeled datasets. Instead of using original video frames, we simplified the scene by highlighting surrounding vehicles and ego-lane. We have investigated the use of several classification heads, augmentation types, and scene simplification alternatives. The most successful model outperforms the best fully supervised model by \(\sim \)2% with an accuracy of 92.52%.
Similar content being viewed by others
Data availability
The labeled and unlabeled simplified scene representation data is publicly available at Github.
References
Insurance Information Institute, Facts + Statistics: Highway safety. Accessed: 2022-06-28
Deo, N., Rangesh, A., Trivedi, M.M.: How would surround vehicles move? a unified framework for maneuver classification and motion prediction. IEEE Trans. Intell. Veh. 3(2), 129–140 (2018)
Jeong, Y., Yi, K.: Bidirectional long short-term memory-based interactive motion prediction of cut-in vehicles in urban environments. IEEE Access 8, 106183–106197 (2020)
Chen, Y., Hu, C., Wang, J.: Human-centered trajectory tracking control for autonomous vehicles with driver cut-in behavior prediction. IEEE Trans. Veh. Technol. 68(9), 8461–8471 (2019)
Yoon, Y., Kim, C., Lee, J., Yi, K.: Interaction-aware probabilistic trajectory prediction of cut-in vehicles using gaussian process for proactive control of autonomous vehicles. IEEE Access 9, 63440–63455 (2021)
Yu, F., Chen, H., Wang, X., Xian, W., Chen, Y., Liu, F., Madhavan, V., Darrell, T.: Bdd100k: A diverse driving dataset for heterogeneous multitask learning. In: CVPR (2020)
Izquierdo, R., Quintanar, A., Parra, I., Fernández-Llorca, D., Sotelo, M.: The prevention dataset: A novel benchmark for prediction of vehicles intentions. In: ITSC (2019)
Altché, F., de La Fortelle, A.: An lstm network for highway trajectory prediction. In: ITSC (2017)
Scheel, O., Nagaraja, N.S., Schwarz, L., Navab, N., Tombari, F.: Attention-based lane change prediction. In: ICRA (2019)
Brosowsky, M., Orschau, P., Dünkel, O., Elspas, P., Slieter, D., Zöllner, M.: Joint vehicle trajectory and cut-in prediction on highways using output constrained neural networks. In: 2021 IEEE Symposium Series on Computational Intelligence (SSCI) (2021)
Izquierdo, R., Quintanar, A., Parra, I., Fernández-Llorca, D., Sotelo, M.: Experimental validation of lane-change intention prediction methodologies based on CNN and LSTM. In: ITSC (2019)
Biparva, M., Fernández-Llorca, D., Izquierdo-Gonzalo, R., Tsotsos, J.K.: Video action recognition for lane-change classification and prediction of surrounding vehicles. Preprint at arXiv:2101.05043 (2021)
Fernández-Llorca, D., Biparva, M., Izquierdo-Gonzalo, R., Tsotsos, J.K.: Two-stream networks for lane-change prediction of surrounding vehicles. In: ITSC (2020)
Qian, R., Meng, T., Gong, B., Yang, M.-H., Wang, H., Belongie, S., Cui, Y.: Spatiotemporal contrastive video representation learning. In: CVPR (2021)
Bastanlar, Y., Orhan, S.: Self-supervised contrastive representation learning in computer vision. In: Artificial Intelligence - Annual Volume 2022, IntechOpen. (2022)
Le-Khac, P.H., Healy, G., Smeaton, A.F.: Contrastive representation learning: a framework and review. Ieee Access 8, 193907–193934 (2020)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR (2020)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML (2020)
Grill, J.-B., Strub, F., Altché, F., Tallec, C., Richemond, P.H., Buchatskaya, E., Doersch, C., Pires, B.A., Guo, Z.D., Azar, M.G., Piot, B., Kavukcuoglu, K., Munos, R., Valko, M.: Bootstrap your own latent: a new approach to self-supervised learning. Adv. Neural. Inf. Process. Syst. 33, 21271–21284 (2020)
Chen, X., He, K.: Exploring simple siamese representation learning. In: CVPR (2021)
Tao, L., Wang, X., Yamasaki, T.: Self-supervised video representation learning using inter-intra contrastive framework. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 2193–2201 (2020)
Han, T., Xie, W., Zisserman, A.: Self-supervised co-training for video representation learning. Adv. Neural. Inf. Process. Syst. 33, 5679–5690 (2020)
Lin, Y., Guo, X., Lu, Y.: Self-supervised video representation learning with meta-contrastive network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8239–8249 (2021)
Knights, J., Harwood, B., Ward, D., Vanderkop, A., Mackenzie-Ross, O., Moghadam, P.: Temporally coherent embeddings for self-supervised video representation learning. In: 2020 25th International Conference on Pattern Recognition (ICPR), IEEE. pp. 8914–8921 (2021)
Wu, Y., Kirillov, A., Massa, F., Lo, W.-Y., Girshick, R.: Detectron2. https://github.com/facebookresearch/detectron2 (2019)
Oord, A.v.d., Li, Y., Vinyals, O.: Representation Learning with Contrastive Predictive Coding. Preprint at arXiv:1807.03748 (2018)
Stamoulakatos, A., Cardona, J., Michie, C., Andonovic, I., Lazaridis, P., Bellekens, X., Atkinson, R., Hossain, M.M., Tachtatzis, C.: A comparison of the performance of 2d and 3d convolutional neural networks for subsea survey video classification. In: OCEANS 2021: San Diego–Porto, pp. 1–10 (2021)
Hara, K., Kataoka, H., Satoh, Y.: Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet? In: CVPR (2018)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Funding
Y.Nalcakan is supported by the Scientific and Technological Research Council of Turkey (TUBITAK) 2244 Scholarship, Grant No: 2244-118C079. The numerical calculations reported in this paper were fully performed at TUBITAK ULAKBIM, High Performance and Grid Computing Center (TRUBA resources).
Author information
Authors and Affiliations
Contributions
Y.Nalcakan prepared all of the data sets and machine learning method codes and performed the experiments. Both authors wrote the manuscript, prepared the figures, and designed the detailed steps of the work. We confirm that the manuscript has been read and approved by both authors and that there are no other persons who satisfied the criteria for authorship but are not listed. We further confirm that the order of authors listed in the manuscript has been approved by all of us.
Corresponding author
Ethics declarations
Conflict of interest
We wish to confirm that there are no known conflicts of interest associated with this publication.
Ethical approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Nalcakan, Y., Bastanlar, Y. Cut-in maneuver detection with self-supervised contrastive video representation learning. SIViP 17, 2915–2923 (2023). https://doi.org/10.1007/s11760-023-02512-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-023-02512-3