Abstract
Object detection techniques that achieve state-of-the-art detection accuracy employ convolutional neural networks, implemented to have lower latency in graphics processing units. Some hardware systems, such as mobile robots, operate under constrained hardware situations, but still benefit from object detection capabilities. Multiple network models have been proposed, achieving comparable accuracy with reduced architectures and leaner operations. Motivated by the need to create a near real-time object detection system for a soccer team of mobile robots operating with x86 CPU-only embedded computers, this work analyses the average precision and inference time of multiple object detection systems in a constrained hardware setting. We train open implementations of MobileNetV2 and MobileNetV3 models with different underlying architectures, achieved by changing their input and width multipliers, as well as YOLOv3, TinyYOLOv3, YOLOv4 and TinyYOLOv4 in an annotated image dataset captured using a mobile robot. We emphasize the speed/accuracy trade-off in the models by reporting their average precision on a test data set and their inference time in videos at different resolutions, under constrained and unconstrained hardware configurations. Results show that MobileNetV3 models have a good trade-off between average precision and inference time in constrained scenarios only, while MobileNetV2 with high width multipliers are appropriate for server-side inference. YOLO models in their official implementations are not suitable for inference in CPUs.
This is a preview of subscription content,
to check access.Materials Availability
The scripts utilized in the experiments presented in this paper are available at https://github.com/douglasrizzo/JINT2020-ball-detection. The image dataset used in the experiments was also made available online [4], along with a static copy of the aforementioned scripts.
References
Alippi, C., Disabato, S., Roveri, M.: Moving convolutional neural networks to embedded systems: the alexnet and VGG-16 case. In: 2018 17th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), pp. 212–223 (2018). https://doi.org/10.1109/IPSN.2018.00049
Ba, L.J., Caruana, R.: Do deep nets really need to be deep? arXiv:1312.6184[cs] (2014)
Bettoni, M., Urgese, G., Kobayashi, Y., Macii, E., Acquaviva, A.: A convolutional neural network fully implemented on FPGA for embedded platforms. In: 2017 New Generation of CAS (NGCAS), pp. 49–52 (2017). https://doi.org/10.1109/ngcas.2017.16
Bianchi, R.A.D.C., Perico, D.H., Homem, T.P.D., da Silva, I.J., Meneghetti, D.D.R.: Open soccer ball dataset. IEEE Dataport. https://doi.org/10.21227/0vvr-5c61 (2020)
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv:2004.10934[cs, eess] (2020)
Buciluǎ, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’06, p 535. ACM Press, Philadelphia (2006). https://doi.org/10/fkdh9m
Canziani, A., Culurciello, E., Paszke, A.: Evaluation of neural network architectures for embedded systems. In: 2017 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1–4 (2017). https://doi.org/10.1109/ISCAS.2017.8050276
Cheng, Y., Wang, D., Zhou, P., Zhang, T.: A survey of model compression and acceleration for deep neural networks. IEEE Signal Process. Mag. 35(1), 126–136 (2020). https://doi.org/10.1109/MSP.2017.2765695
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1800–1807 (2017). https://doi.org/10.1109/CVPR.2017.195
Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks: training deep neural networks with weights and activations constrained to + 1 or -1. arXiv:1602.02830[cs] (2016)
de Oliveira, J.H.R., da Silva, I.J., Homem, T.P.D., Meneghetti, D.D.R., Perico, D.H., Bianchi, R.A.D.C.: Object detection under constrained hardware scenarios: a comparative study of reduced convolutional network architectures. In: 2019 XVI Latin American Robotics Symposium and VII Brazilian Robotics Symposium (LARS/SBR). IEEE (2019)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). https://doi.org/10/cvc7xp
Guo, K., Sui, L., Qiu, J., Yu, J., Wang, J., Yao, S., Han, S., Wang, Y., Yang, H.: Angel-eye: a complete design flow for mapping CNN onto embedded FPGA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37(1), 35–47 (2018). https://doi.org/10/gf2ntg
Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. Advances in Neural Information Processing Systems 28, 1135–1143 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr.2016.90, pp. 770–778 (2016)
Hinton, G., Vinyals, O., Dean, J.: Distilling the Knowledge in a Neural Network. arXiv:1503.02531[cs, stat] (2015)
Howard, A., Sandler, M., Chu, G., Chen, L.C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., Le, Q.V., Adam, H.: Searching for MobileNetV3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019). https://openaccess.thecvf.com/content_ICCV_2019/html/Howard_Searching_for_MobileNetV3_ICCV_2019_paper.html
Howard, A.G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., Adam, H.: MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv:1704.04861 (2017)
Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., Murphy, K.: Speed/accuracy trade-offs for modern convolutional object detectors. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2017.351. IEEE (2017)
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations. arXiv:1609.07061[cs] (2016)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). http://proceedings.mlr.press/v37/ioffe15.html
Jaramillo-Avila, U., Anderson, S.R.: Foveated image processing for faster object detection and recognition in embedded systems using deep convolutional neural networks. In: Martinez-Hernandez, U., Vouloutsi, V., Mura, A., Mangan, M., Asada, M., Prescott, T.J., Verschure, P.F. (eds.) Biomimetic and Biohybrid Systems, Lecture Notes in Computer Science, pp. 193–204. Springer International Publishing, Cham (2019). https://doi.org/10.1007/978-3-030-24741-6_17
Jian, B., Yu, C., Jinshou, Y.: Neural networks with limited precision weights and its application in embedded systems. In: 2010 Second International Workshop on Education Technology and Computer Science, vol. 1, pp. 86–91 (2010). https://doi.org/10.1109/ETCS.2010.448
Jiang, Z., Zhao, L., Li, S., Jia, Y.: Real-time object detection method based on improved YOLOv4-tiny. arXiv:2011.04244[cs] (2020)
Jiao, L., Luo, C., Cao, W., Zhou, X., Wang, L.: Accelerating low bit-width convolutional neural networks with embedded FPGA. In: 2017 27th International Conference on Field Programmable Logic and Applications (FPL), pp. 1–4 (2017). https://doi.org/10.23919/FPL.2017.8056820
Krizhevsky, A.: Convolutional deep belief networks on CIFAR-10. Tech. rep. (2010)
Li, Q., Xiao, Q., Liang, Y.: Enabling high performance deep learning networks on embedded systems. In: IECON 2017 - 43rd Annual Conference of the IEEE Industrial Electronics Society, pp. 8405–8410 (2017). https://doi.org/10/ghpz8h
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision – ECCV 2014, Lecture Notes in Computer Science, pp. 740–755. Springer International Publishing, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: single shot multibox detector. In: Computer Vision – ECCV 2016, vol. 9905, pp. 21–37. Springer International Publishing (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Mao, H., Yao, S., Tang, T., Li, B., Yao, J., Wang, Y.: Towards real-time object detection on embedded systems. IEEE Transactions on Emerging Topics in Computing 6(3), 417–431 (2018). https://doi.org/10/gd7rvr
Niazi-Razavi, M., Savadi, A., Noori, H.: Toward real-time object detection on heterogeneous embedded systems. In: 2019 9th International Conference on Computer and Knowledge Engineering (ICCKE), pp. 450–454 (2019). https://doi.org/10/ghpz8c
Qin, H., Gong, R., Liu, X., Bai, X., Song, J., Sebe, N.: Binary neural networks: a survey. Pattern Recogn. 105, 107281 (2020). https://doi.org/10/ggs3g4
Real, E., Aggarwal, A., Huang, Y., Le, Q.V.: Regularized evolution for image classifier architecture search. In: 33st AAAI Conference on Artificial Intelligence, AAAI 2019. arXiv:1802.01548 (2019)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2016.91. IEEE (2015)
Redmon, J., Farhadi, A.: YOLO9000: better, faster. Stronger. arXiv:1612.08242[cs] (2016)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv:1804.02767[cs] (2018)
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster r-CNN: towards real-time object detection with region proposal networks. arXiv:1506.01497. https://doi.org/10.1109/tpami.2016.2577031 (2015)
Roth, W., Schindler, G., Zöhrer, M., Pfeifenberger, L., Peharz, R., Tschiatschek, S., Fröning, H., Pernkopf, F., Ghahramani, Z.: Resource-Efficient Neural Networks for Embedded Systems. arXiv:2001.03048[cs, stat] (2020)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4510–4520. https://doi.org/10/gfxgjz (2018)
Sifre, L.: Rigid-Motion Scattering for Image Classification. Ph. D. Thesis, Ecole Polytechnique, CMAP, Palaiseau, France (2014)
Sze, V., Chen, Y., Yang, T., Emer, J.S.: Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105(12), 2295–2329 (2017). 10/gcnp38
Tan, M., Chen, B., Pang, R., Vasudevan, V., Sandler, M., Howard, A., Le, Q.V.: MnasNet: Platform-Aware Neural Architecture Search for Mobile. arXiv:1807.11626[cs] (2019)
Tan, M., Le, Q.V.: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv:1905.11946[cs, stat] (2020)
Tan, M., Pang, R., Le, Q.V.: EfficientDet: Scalable and Efficient Object Detection. arXiv:1911.09070[cs, eess] (2020)
Tripathi, S., Dane, G., Kang, B., Bhaskaran, V., Nguyen, T.: LCDet: low-complexity fully-convolutional neural networks for object detection in embedded systems. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 411–420 (2017). https://doi.org/10/ghpz79
Venieris, S.I., Kouris, A., Bouganis, C.S.: Deploying Deep Neural Networks in the Embedded Space. arXiv:1806.08616[cs] (2018)
Yang, T.J., Howard, A., Chen, B., Zhang, X., Go, A., Sandler, M., Sze, V., Adam, H.: NetAdapt: platform-aware neural network adaptation for mobile applications. In: European Conference on Computer Vision (ECCV). https://openaccess.thecvf.com/content_ECCV_2018/papers/Tien-Ju_Yang_NetAdapt_Platform-Aware_Neural_ECCV_2018_paper.pdf (2018)
Zhao, Z., Zhang, Z., Xu, X., Xu, Y., Yan, H., Zhang, L.: A lightweight object detection network for real-time detection of driver handheld call on embedded devices. https://www.hindawi.com/journals/cin/2020/6616584/ (2020)
Zhao, Z.Q., Zheng, P., Xu, S.T., Wu, X.: Object detection with deep learning: a review. IEEE Transactions on Neural Networks and Learning Systems, pp. 1–21. https://doi.org/10.1109/tnnls.2018.2876865 (2019)
Funding
The authors acknowledge the São Paulo Research Foundation (FAPESP Grant 2019/07665-4) for supporting this project. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001.
Author information
Authors and Affiliations
Contributions
– Conceptualization: D. R. Meneghetti; T. P. D. Homem; J. H. R. de Oliveira; R. A. C. Bianchi
– Methodology: D. R. Meneghetti; T. P. D. Homem; D. H. Perico
– Software: D. R. Meneghetti
– Investigation: D. R. Meneghetti
– Formal Analysis: D. R. Meneghetti
– Validation: D. R. Meneghetti
– Data curation: D. R. Meneghetti; T. P. D. Homem; J. H. R. de Oliveira; I. J. da Silva; D. H. Perico; R. A. C. Bianchi
– Writing – original draft: D. R. Meneghetti; T. P. D. Homem
– Writing – review & editing: D. R. Meneghetti; T. P. D. Homem; D. H. Perico; R. A. C. Bianchi
– Visualization: D. R. Meneghetti; T. P. D. Homem; R. A. C. Bianchi
– Resources: D. R. Meneghetti; R. A. C. Bianchi
– Funding acquisition: D. R. Meneghetti; R. A. C. Bianchi, Project administration: R. A. C. Bianchi
– Supervision: R. A. C. Bianchi
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The authors acknowledge the São Paulo Research Foundation (FAPESP Grant 2019/07665-4) for supporting this project. This study was financed in part by the Coordenaç ão de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001.
Rights and permissions
About this article
Cite this article
Meneghetti, D.D.R., Homem, T.P.D., de Oliveira, J. et al. Detecting Soccer Balls with Reduced Neural Networks. J Intell Robot Syst 101, 53 (2021). https://doi.org/10.1007/s10846-021-01336-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10846-021-01336-y