Advertisement

Evaluation of Deep Models for Real-Time Small Object Detection

  • Phuoc Pham
  • Duy Nguyen
  • Tien Do
  • Thanh Duc Ngo
  • Duy-Dinh Le
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10636)

Abstract

Real-time object detection is crucial for many applications. Approaches based on Deep Learning have achieved state-of-the-art performance on challenging datasets. Although several evaluations of the models have been conducted, there is no extensive evaluation with specific focuses on real-time small object detection. In this work, we present an in-depth evaluation of existing deep learning models in detecting small objects. We evaluate three state-of-the-art models including You Only Look Once (YOLO), Single Shot MultiBox Detector (SSD), and Faster R-CNN with related trade-off factors i.e. accuracy, execution time and resource constraints. Experiments were conducted on benchmark datasets and a newly generated dataset for small object detection. All analyses and findings are then presented.

Keywords

Real-time object detection Small object detection 

Notes

Acknowledgement

This research is funded by Vietnam National University HoChiMinh City (VNU-HCM) under grant number B2017-26-01.

References

  1. 1.
    Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., Savarese, S.: Social LSTM: Human trajectory prediction in crowded spaces. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 961–971 (2016)Google Scholar
  2. 2.
    Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The kitti vision benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3354–3361 (2012)Google Scholar
  3. 3.
    Chen, C., Liu, M.-Y., Tuzel, O., Xiao, J.: R-CNN for small object detection. In: Lai, S.-H., Lepetit, V., Nishino, K., Sato, Y. (eds.) ACCV 2016. LNCS, vol. 10115, pp. 214–230. Springer, Cham (2017). doi: 10.1007/978-3-319-54193-8_14 CrossRefGoogle Scholar
  4. 4.
    Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference On Computer Vision, pp. 1440–1448 (2015)Google Scholar
  5. 5.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)Google Scholar
  6. 6.
    He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Cham (2014). doi: 10.1007/978-3-319-10578-9_23 Google Scholar
  7. 7.
    Kembhavi, A., Harwood, D., Davis, L.S.: Vehicle detection using partial least squares. IEEE Trans. Pattern Anal. Mach. Intell. 33(6), 1250–1265 (2011)CrossRefGoogle Scholar
  8. 8.
    Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). doi: 10.1007/978-3-319-10602-1_48 Google Scholar
  9. 9.
    Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). doi: 10.1007/978-3-319-46448-0_2 CrossRefGoogle Scholar
  10. 10.
    Morariu, V.I., Ahmed, E., Santhanam, V., Harwood, D., Davis, L.S.: Composite discriminant factor analysis. In: 2014 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 564–571. IEEE (2014)Google Scholar
  11. 11.
    Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. arXiv preprint arXiv:1612.08242 (2016)
  12. 12.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  13. 13.
    Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30(11), 1958–1970 (2008)CrossRefGoogle Scholar
  14. 14.
    Xiao, J., Ehinger, K.A., Hays, J., Torralba, A., Oliva, A.: Sun database: exploring a large collection of scene categories. Int. J. Comput. Vis. 119(1), 3–22 (2016)CrossRefMathSciNetGoogle Scholar
  15. 15.
    Zhu, Z., Liang, D., Zhang, S., Huang, X., Li, B., Hu, S.: Traffic-sign detection and classification in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2110–2118 (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Phuoc Pham
    • 1
  • Duy Nguyen
    • 1
  • Tien Do
    • 1
  • Thanh Duc Ngo
    • 1
  • Duy-Dinh Le
    • 1
  1. 1.Multimedia Communications Laboratory at University of Information TechnologyVietnam National UniversityHo Chi Minh CityVietnam

Personalised recommendations