Poly-YOLO: higher speed, more precise detection and instance segmentation for YOLOv3

Hurtik, Petr; Molek, Vojtech; Hula, Jan; Vajgl, Marek; Vlasanek, Pavel; Nejezchleba, Tomas

doi:10.1007/s00521-021-05978-9

Poly-YOLO: higher speed, more precise detection and instance segmentation for YOLOv3

Original Article
Published: 19 February 2022

Volume 34, pages 8275–8290, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Petr Hurtik¹,
Vojtech Molek¹,
Jan Hula¹,
Marek Vajgl¹,
Pavel Vlasanek ORCID: orcid.org/0000-0003-0759-0341¹ &
…
Tomas Nejezchleba²

3199 Accesses
56 Citations
10 Altmetric
1 Mention
Explore all metrics

Abstract

We present a new version of YOLO with better performance and extended with instance segmentation called Poly-YOLO. Poly-YOLO builds on the original ideas of YOLOv3 and removes two of its weaknesses: a large amount of rewritten labels and an inefficient distribution of anchors. Poly-YOLO reduces the issues by aggregating features from a light SE-Darknet-53 backbone with a hypercolumn technique, using stairstep upsampling, and produces a single scale output with high resolution. In comparison with YOLOv3, Poly-YOLO has only 60% of its trainable parameters but improves the mean average precision by a relative 40%. We also present Poly-YOLO lite with fewer parameters and a lower output resolution. It has the same precision as YOLOv3, but it is three times smaller and twice as fast, thus suitable for embedded devices. Finally, Poly-YOLO performs instance segmentation by bounding polygons. The network is trained to detect size-independent polygons defined on a polar grid. Vertices of each polygon are being predicted with their confidence, and therefore, Poly-YOLO produces polygons with a varying number of vertices. Source code is available at https://gitlab.com/irafm-ai/poly-yolo.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 3

Fig. 12

U-Net: Convolutional Networks for Biomedical Image Segmentation

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

SSD: Single Shot MultiBox Detector

Notes

References

Hurtik P, Molek V, Vlasanek P (2020) YOLO-ASC: you only look once and see contours, accepted. In: Proceedings of IEEE-WCCI conference
Kirillov A, He K, Girshick R, Rother C, Dollár P (2019) Panoptic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9404–9413
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
He K, Gkioxari G, Dollár P, Girshick R (2020) Mask R-CNN. IEEE Trans Pattern Anal Mach Intell 42(2):386–397
Article Google Scholar
Fu CY, Shvets M, Berg AC (2019) RetinaMask: learning to predict masks improves state-of-the-art single-shot detection for free. arXiv preprint, arXiv:1901.03353
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint, arXiv:1804.02767
Huang R, Pedoeem J, Chen C (2018) YOLO-LITE: a real-time object detection algorithm optimized for non-GPU computers. In: 2018 IEEE international conference on big data (big data). IEEE, pp 2503–2510
Oltean G, Florea C, Orghidan R, Oltean V (2019) Towards real time vehicle counting using yolo-tiny and fast motion estimation. In: 2019 IEEE 25th international symposium for design and technology in electronic packaging (SIITME). IEEE, pp 240–243
Tan M, Pang R, Le QV (2019) Efficientdet: scalable and efficient object detection. arXiv preprint, arXiv:1911.09070
Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp 734–750
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500
Shen Zhiqiang, Liu Zhuang, Li Jianguo, Jiang Yu-Gang, Chen Yurong, Xue Xiangyang (2019) Object detection from scratch with deep supervision. IEEE Trans Pattern Anal Mach Intell 42(2):398–412
Article Google Scholar
Zhu R, Zhang S, Wang X, Wen L, Shi H, Bo L, Mei T (2019) ScratchDet: training single-shot object detectors from scratch. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2268–2277
Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Dollár Piotr, Appel Ron, Belongie Serge, Perona Pietro (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545
Article Google Scholar
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4203–4212
Tian Z, Shen C, Chen H, He T (2019) FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE international conference on computer vision, pp 9627–9636
Chen X, Girshick R, He K, Dollár P (2019) TensorMask: a foundation for dense object segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 2061–2069
Bolya D, Zhou C, Xiao F, Lee YJ (2019) YOLACT++: better real-time instance segmentation. arXiv preprint, arXiv:1912.06218
Newell A, Huang Z, Deng J (2017) Associative embedding: end-to-end learning for joint detection and grouping. In: Advances in neural information processing systems, pp 2277–2287
Liu S, Jia J, Fidler S, Urtasun R (2017) SGN: sequential grouping networks for instance segmentation. In: Proceedings of the IEEE International conference on computer vision, pp 3496–3504
Wang X, Zhang R, Kong T, Li L, Shen C (2020) SOLOv2: dynamic, faster and stronger. arXiv preprint, arXiv:2003.10152
Xie E, Sun P, Song X, Wang W, Liu X, Liang D, Shen C, Luo P (2019) PolarMask: single shot instance segmentation with polar representation. arXiv preprint, arXiv:1909.13226
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
MacQueen J et al (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1. Oakland, CA, USA, pp 281–297
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755
Ronneberger O, Fischer P, Brox T (2015) U-Net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241
Hariharan Bharath, Arbelaez Pablo, Girshick Ross, Malik Jitendra (2016) Object instance segmentation and fine-grained localization using hypercolumns. IEEE Trans Pattern Anal Mach Intell 39(4):627–639
Article Google Scholar
Cordts M, Omran M, Ramos S, Rehfeld T, Enzweiler M, Benenson R, Franke U, Roth S, Schiele B (2016) The cityscapes dataset for semantic urban scene understanding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3213–3223
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Bochkovskiy A, Wang CY, Liao HY (2020) YOLOv4: optimal speed and accuracy of object detection. arXiv preprint, arXiv:2004.10934
Lee Honglak, Grosse Roger, Ranganath Rajesh, Ng Andrew Y (2011) Unsupervised learning of hierarchical representations with convolutional deep belief networks. Commun ACM 54(10):95–103
Article Google Scholar
Ruder S (2017) An overview of multi-task learning in deep neural networks. arXiv preprint, arXiv:1706.05098
Varma G, Subramanian A, Namboodiri A, Chandraker M, Jawahar CV (2019) IDD: a dataset for exploring problems of autonomous navigation in unconstrained environments. In: 2019 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1743–1751

Download references

Acknowledgements

The work is supported by ERDF/ESF “Centre for the development of Artificial Intelligence Methods for the Automotive Industry of the region” (No. CZ.02.1.01/0.0/0.0/17049/0008414).

Author information

Authors and Affiliations

Centre of Excellence IT4Innovations, Institute for Research and Applications of Fuzzy Modeling, University of Ostrava, 30. dubna 22, Ostrava, Czech Republic
Petr Hurtik, Vojtech Molek, Jan Hula, Marek Vajgl & Pavel Vlasanek
Varroc Lighting Systems, Suvorovova 195, Šenov u Nového Jičína, Czech Republic
Tomas Nejezchleba

Authors

Petr Hurtik
View author publications
You can also search for this author in PubMed Google Scholar
Vojtech Molek
View author publications
You can also search for this author in PubMed Google Scholar
Jan Hula
View author publications
You can also search for this author in PubMed Google Scholar
Marek Vajgl
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Vlasanek
View author publications
You can also search for this author in PubMed Google Scholar
Tomas Nejezchleba
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Petr Hurtik.

Ethics declarations

Conflict of interest

Authors declare that they have no conflict of interest.

Human and animals participants

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hurtik, P., Molek, V., Hula, J. et al. Poly-YOLO: higher speed, more precise detection and instance segmentation for YOLOv3. Neural Comput & Applic 34, 8275–8290 (2022). https://doi.org/10.1007/s00521-021-05978-9

Download citation

Received: 12 November 2020
Accepted: 25 March 2021
Published: 19 February 2022
Issue Date: May 2022
DOI: https://doi.org/10.1007/s00521-021-05978-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Poly-YOLO: higher speed, more precise detection and instance segmentation for YOLOv3

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animals participants

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Poly-YOLO: higher speed, more precise detection and instance segmentation for YOLOv3

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

Object detection using YOLO: challenges, architectural successors, datasets and applications

SSD: Single Shot MultiBox Detector

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Human and animals participants

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation