Skip to main content

RFSOD: a lightweight single-stage detector for real-time embedded applications to detect small-size objects

Abstract

Small-size object detection (SOD) is one of the challenging problems in computer vision applications. SOD is highly useful in defense, military, surveillance, medical, industrial and analysis in sports applications. Various algorithms were developed in the past to solve the problem of SOD. However, the algorithms developed are not suitable for real-time applications. In this work, a convolutional neural network architecture based on YOLO is proposed to enhance small objects' detection performance. The proposed network is inspired by the ideas of Residual blocks, Densenet, Feature Pyramidal Network, Cross stage partial connections, and 1 × 1 convolutions. The Receptive field and the reuse of feature maps are the main factors in the design of the architecture and is hence referred to as RFSOD. It is developed as a lightweight network to suit real-time applications and can run smoothly on single-board computers such as Jetson Nano, Tx2, Raspberry Pi and the like. The proposed model is evaluated on various public datasets such as VHR10, BCCD dataset and few small-size objects from the MS COCO dataset. This work is motivated by the need to develop a vision system for a badminton-playing robot. Therefore, the proposed model is also tested on a custom-made shuttlecock dataset. The model's performance is compared with the state-of-the-art deep learning models that are suitable for real-time applications. The hardware implementation of the proposed model was carried out on Jetson Nano, Raspberry Pi4 and a Laptop with an i5 processor. Improved Detection accuracy was observed on small objects. More than 2 × detection speed was obtained on Raspberry Pi, and i5 processor while 30% improvement was observed on Jetson Nano with real-time videos.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Availability of data and material

All the data used in this work are available from the corresponding author upon request.

Code availability

The codes used are available from the corresponding author upon request.

References

  1. 1.

    Lu, D., Weng, Q.: A survey of image classification methods and techniques for improving classification performance. Int. J. Remote Sens. 28, 823–870 (2007). https://doi.org/10.1080/01431160600746456

    Article  Google Scholar 

  2. 2.

    Hong, D., Gao, L., Yao, J., Zhang, B.: Graph convolutional networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 59, 1–13 (2020)

    Google Scholar 

  3. 3.

    Parekh, S.H., Thakore, G.D., Jaliya, U.K.: A survey on object detection and tracking. Int. J. Adv. Eng. Res. Dev. 3, 2970–2978 (2016). https://doi.org/10.21090/IJAERD.030144

    Article  Google Scholar 

  4. 4.

    Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., Martinez-Gonzalez, P., Garcia-Rodriguez, J.: A survey on deep learning techniques for image and video semantic segmentation. Appl. Soft Comput. 70, 41–65 (2018). https://doi.org/10.1016/j.asoc.2018.05.018

    Article  Google Scholar 

  5. 5.

    Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., Garcia-Rodriguez, J.: A Review on Deep Learning Techniques Applied to Semantic Segmentation. arXiv Prepr. arXiv1704.06857. (2017)

  6. 6.

    De Brabandere, B., Neven, D., Van Gool, L.: Semantic Instance Segmentation with a Discriminative Loss Function. {arXiv Prepr. arXiv1708.02551. (2017)

  7. 7.

    Romera-Paredes, B., Hilaire, P., Torr, S.: Recurrent Instance Segmentation. In: European conference on computer vision. pp. 312–329. Springer (2016)

  8. 8.

    Ciaparrone, G., Luque Sánchez, F., Tabik, S., Troiano, L., Tagliaferri, R., Herrera, F.: Deep learning in video multi-object tracking: a survey. Neurocomputing 381, 61–88 (2020). https://doi.org/10.1016/J.NEUCOM.2019.11.023

    Article  Google Scholar 

  9. 9.

    Reuter, S., Wilking, B., Wiest, J., Munz, M.: Real-time multi-object tracking using random finite sets. IEEE Trans. Aerosp. Electron. Syst. 49, 2666–2678 (2013)

    Article  Google Scholar 

  10. 10.

    Yang, L., Qin, Y., Zhang, X.: Lightweight densely connected residual network for human pose estimation. J. Real-Time Image Process. 18, 825–837 (2021). https://doi.org/10.1007/s11554-020-01025-3

    Article  Google Scholar 

  11. 11.

    Chen, Y., Tian, Y., He, M.: Monocular human pose estimation: a survey of deep learning-based methods. Comput. Vis. Image Underst. (2020). https://doi.org/10.1016/j.cviu.2019.102897

    Article  Google Scholar 

  12. 12.

    Fu, Y., Lei, Y., Wang, T., Curran, W.: Deep learning in medical image registration: a review. Phys. Med. Biol. 65, 20–21 (2020). https://doi.org/10.1088/1361-6560/ab843e

    Article  Google Scholar 

  13. 13.

    Nandalike, R., Sarojadevi, H.: Multimodal image feature detection with ROI-based optimization for image registration. J. Real-Time Image Process. 17, 1007–1013 (2019). https://doi.org/10.1007/S11554-018-0847-Z

    Article  Google Scholar 

  14. 14.

    Farfade, S.S., Saberian, M.J., Li, L.-J.: Multi-view Face Detection Using Deep Convolutional Neural Networks. In: Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. pp. 643–650. ACM, New York, NY, USA (2015)

  15. 15.

    Lee, J.-G., Jun, S., Cho, Y.-W., Lee, H.: Deep learning in medical imaging: general overview. Korean J. Radiol. 18, 570 (2017)

    Article  Google Scholar 

  16. 16.

    Dalal, N., Triggs, B.: Histograms of Oriented Gradients for Human Detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). pp. 886–893. IEEE (2005)

  17. 17.

    Ahonen, T., Hadid, A., Pietikainen, M.: Face description with local binary patterns: application to face recognition. IEEE Trans. Pattern Anal. Mach. Intell. 28, 2037–2041 (2006). https://doi.org/10.1109/TPAMI.2006.244

    Article  MATH  Google Scholar 

  18. 18.

    Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features (SURF). Comput. Vis. Image Underst. 110, 346–359 (2008). https://doi.org/10.1016/j.cviu.2007.09.014

    Article  Google Scholar 

  19. 19.

    Bay, H., Tuytelaars, T., Van Gool, L.: SURF: Speeded Up Robust Features. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). pp. 404–417 (2006)

  20. 20.

    Deng, J., Dong, W., Socher, R., Li-Jia, Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition. pp. 248–255 (2009)

  21. 21.

    Krizhevsky, A., Sutskever, I.: Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25, 1097–1105 (2012)

    Google Scholar 

  22. 22.

    Girshick, R., Donahue, J., Darrell, T., Malik, J., Berkeley, U.C., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition. pp. 580–587 (2014)

  23. 23.

    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: IEEE international conference on computer vision. pp. 2961–2969 (2017)

  24. 24.

    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28, 91 (2015)

    Google Scholar 

  25. 25.

    Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. Eur. Conf. Comput. Vis. (2015). https://doi.org/10.1007/978-3-319-46448-0_2

    Article  Google Scholar 

  26. 26.

    Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You Only Look Once: Unified, Real-Time Object Detection. In: IEEE conference on computer vision and pattern recognition. pp. 779–788 (2016)

  27. 27.

    Redmon, J., Farhadi, A.: YOLO9000: Better, faster, stronger. In: In Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 7263–7271. Institute of Electrical and Electronics Engineers Inc. (2017)

  28. 28.

    Redmon, J., Farhadi, A.: YOLOv3: An Incremental Improvement. arXiv Prepr. arXiv. (2018)

  29. 29.

    Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv Prepr. arXiv2004.10934. (2020)

  30. 30.

    Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal Loss for Dense Object Detection. In: Proceedings of the IEEE international conference on computer vision. pp. 2980–2988 (2017)

  31. 31.

    Fu, C.Y., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: Deconvolutional single shot detector. arXiv Prepr. arXiv1701.06659. (2017)

  32. 32.

    Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path Aggregation Network for Instance Segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8759–8768 (2018)

  33. 33.

    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1904–1916 (2015). https://doi.org/10.1109/TPAMI.2015.2389824

    Article  Google Scholar 

  34. 34.

    Tan, M., Le, Q. V.: EfficientNet: Rethinking model scaling for convolutional neural networks. In: 36th International Conference on Machine Learning, ICML 2019. pp. 10691–10700. International Machine Learning Society (IMLS) (2019)

  35. 35.

    Lee, Y., Park, J.: CenterMask : Real-Time Anchor-Free Instance Segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 13906–13915 (2020)

  36. 36.

    Jocher, G., Stoken, A., Borovec, J., NanoCode012, Chaurasia, A., And, T., And, L.C., And, A. V, And, L., And, T., And, Y., And, A.H., And, L., And, A., And, J.H., And, L.D., And, M., And, Y.K., And, O., And, W., And, Y.D., And, A.L., And, M., And, B.M., And, B.F., And, D.K., And, D.Y., And, D., And, D., Ingham}, F.: ultralytics/yolov5: v5.0-YOLOv5-P6 1280 models, AWS, Supervise.ly and YouTube integrations (2021). https://doi.org/10.5281/zenodo.4679653

  37. 37.

    Hendry, Chen, R.-C.: Automatic license plate recognition via sliding-window darknet-YOLO deep learning. Image Vis. Comput. 87, 47–56 (2019). https://doi.org/10.1016/j.imavis.2019.04.007

    Article  Google Scholar 

  38. 38.

    Sun, X., Gu, J., Huang, R.: A modified SSD method for electronic components fast recognition. Optik (Stuttg) (2020). https://doi.org/10.1016/j.ijleo.2019.163767

    Article  Google Scholar 

  39. 39.

    He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778 (2016)

  40. 40.

    Cao, C., Wang, B., Zhang, W., Zeng, X., Yan, X.: An improved faster R-CNN for small object detection. IEEE Access. 7, 106838–106846 (2019)

    Article  Google Scholar 

  41. 41.

    Girshick, R.: Fast R-CNN. In: IEEE international conference on computer vision. pp. 1440–1448 (2015)

  42. 42.

    Pérez-Hernández, F., Tabik, S., Lamas, A., Olmos, R., Fujita, H., Herrera, F.: Object detection binary classifiers methodology based on deep learning to identify small objects handled similarly: application in video surveillance. Knowledge-Based Syst. 194, 105590 (2020). https://doi.org/10.1016/j.knosys.2020.105590

    Article  Google Scholar 

  43. 43.

    Hendry, R.-C.C.: Automatic license plate recognition via sliding-window darknet-YOLO deep learning. Image Vis. Comput. 87, 47–56 (2019). https://doi.org/10.1016/j.imavis.2019.04.007

    Article  Google Scholar 

  44. 44.

    Hsu, G.-S., Chen, J.-C., Chung, Y.-Z.: Application-oriented license plate recognition. IEEE Trans. Veh. Technol. 62, 552–561 (2013). https://doi.org/10.1109/TVT.2012.2226218

    Article  Google Scholar 

  45. 45.

    Bosquet, B., Mucientes, M., Brea, V.M.: STDnet: exploiting high resolution feature maps for small object detection. Eng. Appl. Artif. Intell. 91, 103615 (2020). https://doi.org/10.1016/j.engappai.2020.103615

    Article  Google Scholar 

  46. 46.

    Cui, L., Ma, R., Lv, P., Jiang, X., Gao, Z., Zhou, B., Xu, M.: MDSSD: Multi-scale Deconvolutional Single Shot Detector for Small Objects. arXiv. 2–4 (2018)

  47. 47.

    Li, Y., Dong, H., Li, H., Zhang, X., Zhang, B., Xiao, Z.: Multi-block SSD based on small object detection for UAV railway scene surveillance. Chin. J. Aeronaut. 33, 1747–1755 (2020). https://doi.org/10.1016/j.cja.2020.02.024

    Article  Google Scholar 

  48. 48.

    Luo, H.-W., Zhang, C.-S., Pan, F.-C., Ju, X.-M.: Contextual-YOLOV3: Implement Better Small Object Detection Based Deep Learning. In: 2019 International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI). pp. 134–141. IEEE (2019)

  49. 49.

    Hu, P., Ramanan, D.: Finding Tiny Faces Supplementary Materials. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 951–959 (2017)

  50. 50.

    Chen, C., Liu, M.-Y., Tuzel, O., Xiao, J.: R-CNN for Small Object Detection. In: n Asian conference on computer vision. pp. 214–230. Springe, Cham (2017)

  51. 51.

    Du, P., Qu, X., Wei, T., Peng, C., Zhong, X., Chen, C.: Research on Small-size Object Detection in Complex Background. In: 2018 Chinese Automation Congress (CAC). pp. 4216–4220. IEEE (2018)

  52. 52.

    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. pp. 1–9. IEEE Computer Society (2015)

  53. 53.

    Huang, R., Pedoeem, J., Chen, C.: YOLO-LITE: A Real-Time Object Detection Algorithm Optimized for Non-GPU Computers. In: 2018 IEEE International Conference on Big Data (Big Data). pp. 2503–2510. IEEE (2018)

  54. 54.

    Mao, Q.-C., Sun, H.-M., Liu, Y.-B., Jia, R.-S.: Mini-YOLOv3: real-time object detector for embedded applications. IEEE Access. 7, 133529–133538 (2019)

    Article  Google Scholar 

  55. 55.

    Yin, Y., Li, H., Fu, W.: Faster-YOLO: An accurate and faster object detection method. Digit. Signal Process. 102, 102756 (2020). https://doi.org/10.1016/j.dsp.2020.102756

    Article  Google Scholar 

  56. 56.

    Wu, B., Wan, A., Iandola, F., Jin, P.H., Keutzer, K.: SqueezeDet: Unified, Small, Low Power Fully Convolutional Neural Networks for Real-Time Object Detection for Autonomous Driving. arXiv Prepr. 129–137 (2016)

  57. 57.

    Fang, W., Wang, L., Ren, P.: Tinier-YOLO: a real-time object detection method for constrained environments. IEEE Access. 8, 1935–1944 (2020). https://doi.org/10.1109/ACCESS.2019.2961959

    Article  Google Scholar 

  58. 58.

    Nguyen, N., Do, T., Ngo, T.D., Le, D.: An evaluation of deep learning methods for small object detection. J. Electr. Comput. Eng. 2020, 1 (2020)

    Article  Google Scholar 

  59. 59.

    Liu, Y., Sun, P., Wergeles, N., Shang, Y.: A survey and performance evaluation of deep learning methods for small object detection. Expert Syst. Appl. 172, 114602 (2021). https://doi.org/10.1016/j.eswa.2021.114602

    Article  Google Scholar 

  60. 60.

    Huang, Z., Wang, J., Fu, X., Yu, T., Guo, Y., Wang, R.: DC-SPP-YOLO: Dense connection and spatial pyramid pooling based YOLO for object detection. Inf. Sci. (Ny) 522, 241–258 (2020). https://doi.org/10.1016/j.ins.2020.02.067

    MathSciNet  Article  Google Scholar 

  61. 61.

    Zheng, Z., Wang, P., Ren, D., Liu, W., Ye, R., Hu, Q., Zuo, W.: Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. arXiv Prepr. arXiv2005.03572. (2020)

Download references

Acknowledgements

The authors would like to thank the Government of India for the Technical Education Quality Improvement Program (TEQIP III), coordinators of TEQIP III and Dean, Research and consultancy at National Institute of Technology Calicut for providing financial aid to procure 2080ti GPU-workstation and Baumer high-speed cameras.

Author information

Affiliations

Authors

Contributions

AAN: conceptualization, methodology, data acquisition, data curation, interpretation of data, software, visualization, writing—original draft. SRV: Data collection, software. SAP: Formal analysis, writing—review and editing, supervision, funding acquisition. LA: Writing—review and editing, supervision.

Corresponding author

Correspondence to A. P. Sudheer.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Amudhan, A.N., Vrajesh, S.R., Sudheer, A.P. et al. RFSOD: a lightweight single-stage detector for real-time embedded applications to detect small-size objects. J Real-Time Image Proc (2021). https://doi.org/10.1007/s11554-021-01170-3

Download citation

Keywords

  • Small object detection
  • Convolutional neural network
  • YOLO
  • Lightweight architecture
  • Object detection and tracking
  • Deep learning architecture