Deep Networks for Monitoring Waterway Traffic in the Mekong Delta

Do, Thanh-Nghi; Tran-Nguyen, Minh-Thu; Trang, Thanh-Tri; Vo, Tri-Thuc

doi:10.1007/978-3-030-92666-3_27

Thanh-Nghi Do^12,13,
Minh-Thu Tran-Nguyen¹²,
Thanh-Tri Trang¹² &
…
Tri-Thuc Vo¹²

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 363))

Included in the following conference series:

International Conference on Modelling, Computation and Optimization in Information Systems and Management Sciences

332 Accesses
1 Citations

Abstract

Our investigation aims at training deep networks for monitoring waterway traffic means on the rivers in the Mekong Delta. We collected the real videos of the waterway traffic, and then tagging the five most popular means in frames extracted from the videos, making an image dataset. We propose to train recent deep network models such as YOLO v4 (You only look once), RetinaNet and EfficientDet on this image dataset to detect the five most popular means in the videos. The numerical test results show that YOLO v4 gives highest accuracy than two other methods, including RetinaNet and EfficientDet. YOLO v4 achieves the performances on the testset with a precision of 91%, a recall of 98%, F1-score of 94% and mean average precision (mAP@0.50) of 97.51%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/. Software available from tensorflow.org
Bochkovskiy, A., Wang, C.Y., Liao, H.: YOLOv4: optimal speed and accuracy of object detection. arXiv:2004.10934 (2020)
Bosch, A., Zisserman, A., Muñoz, X.: Scene classification via pLSA. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 517–530. Springer, Heidelberg (2006). https://doi.org/10.1007/11744085_40
Chapter Google Scholar
Chollet, F., et al.: Keras (2015). https://keras.io
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005) - Volume 1, pp. 886–893. IEEE Computer Society (2005)
Google Scholar
Gaiser, H., et al.: fizyr/keras-retinanet 0.5.1, June 2019. https://doi.org/10.5281/zenodo.3250670
Girshick, R.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015). https://doi.org/10.1109/ICCV.2015.169
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014). https://doi.org/10.1109/CVPR.2014.81
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2016). https://doi.org/10.1109/TPAMI.2015.2437384
Article Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016). http://www.deeplearningbook.org
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_23
Chapter Google Scholar
Itseez: Open source computer vision library (2015). https://github.com/itseez/opencv
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386
Article Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Article Google Scholar
Li, F., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), San Diego, CA, USA, 20–26 June 2005, pp. 524–531 (2005)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944 (2017). https://doi.org/10.1109/CVPR.2017.106
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018). https://doi.org/10.1109/CVPR.2018.00913
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, Bastian, Matas, Jiri, Sebe, Nicu, Welling, Max (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Lowe, D.: Object recognition from local scale invariant features. In: Proceedings of the 7th International Conference on Computer Vision, pp. 1150–1157 (1999)
Google Scholar
Lowe, D.: Distinctive image features from scale invariant keypoints. Int. J. Comput. Vis. 60, 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94
Article Google Scholar
Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 145–175 (2001)
Article Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6517–6525 (2017). https://doi.org/10.1109/CVPR.2017.690
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 1, NIPS 2015, pp. 91–99. MIT Press, Cambridge (2015)
Google Scholar
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: 9th IEEE International Conference on Computer Vision (ICCV 2003), Nice, France, 14–17 October 2003, pp. 1470–1477 (2003)
Google Scholar
Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. arXiv:1905.11946 (2019)
Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10778–10787 (2020). https://doi.org/10.1109/CVPR42600.2020.01079
Vapnik, V.: The Nature of Statistical Learning Theory, 2nd edn. Springer, New York (2000). https://doi.org/10.1007/978-1-4757-3264-1
Wang, C.Y., Mark Liao, H.Y., Wu, Y.H., Chen, P.Y., Hsieh, J.W., Yeh, I.H.: CSPNet: a new backbone that can enhance learning capability of CNN. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1571–1580 (2020). https://doi.org/10.1109/CVPRW50498.2020.00203

Download references

Acknowledgments

This work has received support from the College of Information Technology, Can Tho University. The authors would like to thank very much the Big Data and Mobile Computing Laboratory.

Author information

Authors and Affiliations

College of Information Technology, Can Tho University, Cantho, 92000, Vietnam
Thanh-Nghi Do, Minh-Thu Tran-Nguyen, Thanh-Tri Trang & Tri-Thuc Vo
UMI UMMISCO 209 (IRD/UPMC), Sorbonne University, Pierre and Marie Curie University, Paris 6, France
Thanh-Nghi Do

Authors

Thanh-Nghi Do
View author publications
You can also search for this author in PubMed Google Scholar
Minh-Thu Tran-Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Thanh-Tri Trang
View author publications
You can also search for this author in PubMed Google Scholar
Tri-Thuc Vo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thanh-Nghi Do .

Editor information

Editors and Affiliations

Computer science and Applications Department, LGIPM, University of Lorraine, Metz Cedex, France
Hoai An Le Thi
Laboratory of Mathematics, National Institute for Applied Sciences - Rouen, Saint-Etienne-du-Rouvray Cedex, France
Tao Pham Dinh
Computer science and Applications Department, LGIPM, University of Lorraine, Metz Cedex, France
Hoai Minh Le

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Do, TN., Tran-Nguyen, MT., Trang, TT., Vo, TT. (2022). Deep Networks for Monitoring Waterway Traffic in the Mekong Delta. In: Le Thi, H.A., Pham Dinh, T., Le, H.M. (eds) Modelling, Computation and Optimization in Information Systems and Management Sciences. MCO 2021. Lecture Notes in Networks and Systems, vol 363. Springer, Cham. https://doi.org/10.1007/978-3-030-92666-3_27

Download citation

DOI: https://doi.org/10.1007/978-3-030-92666-3_27
Published: 08 December 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-92665-6
Online ISBN: 978-3-030-92666-3
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Deep Networks for Monitoring Waterway Traffic in the Mekong Delta