Small-objectness sensitive detection based on shifted single shot detector

Fang, Liangji; Zhao, Xu; Zhang, Shiquan

doi:10.1007/s11042-018-6227-7

Small-objectness sensitive detection based on shifted single shot detector

Published: 14 June 2018

Volume 78, pages 13227–13245, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

967 Accesses
12 Citations
Explore all metrics

Abstract

We present a small object sensitive method for object detection. Our method is built based on SSD (Single Shot MultiBox Detector (Liu et al. 2016)), a simple but effective deep neural network for image object detection. The discrete nature of anchor mechanism used in SSD, however, may cause misdetection for the small objects located at gaps between the anchor boxes. SSD performs better for small object detection after circular shifts of the input image. Therefore, auxiliary feature maps are generated by conducting circular shifts over lower extra feature maps in SSD for small-object detection, which is equivalent to shifting the objects in order to fit the locations of anchor boxes. We call our proposed system Shifted SSD. Moreover, pinpoint accuracy of localization is of vital importance to small objects detection. Hence, two novel methods called Smooth NMS and IoU-Prediction module are proposed to obtain more precise locations. Then for video sequences, we generate trajectory hypothesis to obtain predicted locations in a new frame for further improved performance. Experiments conducted on PASCAL VOC 2007, along with MS COCO, KITTI and our small object video datasets, validate that both mAP and recall are improved with different degrees and the speed is almost the same as SSD.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

For SSD300^∗ model, the step between anchor boxes is 8 pixels on the lowest prediction layer and 16 on the next layer. So 4 and 8 pixels are the half length of the grid.
For simplicity, left shifted direction is treated equivalently to right shifted direction.
Small objects have little visualization differences in localization accuracy. For the clarity of visualization , we just illustrate examples of big objects on Reverse out phenomenon.
Because we do not know the exact matching between ground truth target and the predictions of the lowest layer.

References

Bell S, Lawrence Zitnick C, Bala K, Girshick R (2016) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883
Bromley J, Bentz JW, Bottou L, Guyon I, LeCun Y, Moore C, Säckinger E., Shah R (1993) Signature verification using a siamese time delay neural network. Int J Pattern Recognit Artif Intell 7(04):669–688
Article Google Scholar
Chen C, Liu MY, Tuzel O, Xiao J (2016) R-cnn for small object detection. In: Asian conference on computer vision, pp 214–230. Springer
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected crfs. arXiv:1412.7062
Everingham M, Van Gool L, Williams C, Winn J, Zisserman A (2008) The pascal visual object classes challenge 2007 (voc 2007) results (2007)
Everingham M, Winn J (2007) The pascal visual object classes challenge 2007 (voc2007) development kit. University of Leeds, Tech. Rep
Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: Deconvolutional single shot detector. arXiv:1701.06659
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: Conference on computer vision and pattern recognition (CVPR)
Gidaris S, Komodakis N (2016) Locnet: Improving localization accuracy for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 789–798
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Hariharan B, Arbeláez P., Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 447–456
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European Conference on Computer Vision, pp 346–361. Springer
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. arXiv:1512.03385
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Henriques JF, Caseiro R, Martins P, Batista J (2015) High-speed tracking with kernelized correlation filters. IEEE Trans Pattern Anal Mach Intell 37(3):583–596
Article Google Scholar
Hoiem D, Chodpathumwan Y, Dai Q (2012) Diagnosing error in object detectors. In: European conference on computer vision, pp 340–353. Springer
Hong C, Yu J, Tao D, Wang M (2015) Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval. IEEE Trans Ind Electron 62(6):3742–3751
Google Scholar
Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670
Article MathSciNet MATH Google Scholar
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia, pp 675–678. ACM
Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) Ron: Reverse connection with objectness prior networks for object detection. arXiv:1707.01691
Li Z, Liu J, Tang J, Lu H (2015) Robust structured subspace learning for data representation. IEEE Trans Pattern Anal Mach Intell 37(10):2085–2098
Article Google Scholar
Li Z, Liu J, Yang Y, Zhou X, Lu H (2014) Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans Knowl Data Eng 26 (9):2138–2150
Article Google Scholar
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P., Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, pp 740–755. Springer
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, pp 21–37. Springer
Liu W, Rabinovich A, Berg AC (2015). arXiv:1506.04579
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Milan A, Leal-Taix L, Schindler K, Reid I (2015) Joint tracking and segmentation of multiple targets cvpr
Park S, Kwak N Analysis on the dropout effect in convolutional neural networks
Pirsiavash H, Ramanan D, Fowlkes CC (2011) Globally-optimal greedy algorithms for tracking a variable number of objects. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR), pp 1201–1208. IEEE
Redmon J, Divvala S, Girshick R, Farhadi A (2015) You only look once: Unified, real-time object detection. arXiv:1506.02640
Redmon J, Farhadi A (2016) Yolo9000: Better, faster, stronger. arXiv:1612.08242
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
MathSciNet MATH Google Scholar
Tychsen-Smith L, Petersson L (2017) Denet: Scalable real-time object detection with directed sparse sampling. arXiv:1703.10295
Wang X, Han TX, Yan S (2009) An hog-lbp human detector with partial occlusion handling. In: 2009 IEEE 12th international conference on computer vision, pp 32–39. IEEE
Xiang Y, Choi W, Lin Y, Savarese S (2015) Data-driven 3d voxel patterns for object category recognition. In: Proceedings of the IEEE international conference on computer vision and pattern recognition
Xiang Y, Choi W, Lin Y, Savarese S (2017) Subcategory-aware convolutional neural networks for object proposals and detection. In: 2017 IEEE winter conference on applications of computer vision (WACV), pp 924–933. IEEE
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv:1511.07122
Yu J, Hong C, Rui Y, Tao D (2017) Multi-task autoencoder model for recovering human poses. IEEE Transactions on Industrial Electronics
Yu J, Zhang B, Kuang Z, Lin D, Fan J (2017) Iprivacy: image privacy protection by identifying sensitive objects via deep multi-task learning. IEEE Trans Inf Forensics Secur 12(5):1005–1016
Article Google Scholar
Zhang L, Lin L, Liang X, He K (2016) Is faster r-cnn doing well for pedestrian detection?. In: European conference on computer vision, pp 443–457. Springer
Zhou H, Li Z, Ning C, Tang J (2017) Cad: Scale invariant framework for real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 760–768

Download references

Acknowledgments

This research is supported by NSFC funding (61673269, 61273285).

Author information

Authors and Affiliations

Department of Automation, Shanghai Jiao Tong University, Shanghai, China
Liangji Fang, Xu Zhao & Shiquan Zhang

Authors

Liangji Fang
View author publications
You can also search for this author in PubMed Google Scholar
Xu Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Shiquan Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xu Zhao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fang, L., Zhao, X. & Zhang, S. Small-objectness sensitive detection based on shifted single shot detector. Multimed Tools Appl 78, 13227–13245 (2019). https://doi.org/10.1007/s11042-018-6227-7

Download citation

Received: 28 July 2017
Revised: 27 December 2017
Accepted: 29 May 2018
Published: 14 June 2018
Issue Date: 30 May 2019
DOI: https://doi.org/10.1007/s11042-018-6227-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Small-objectness sensitive detection based on shifted single shot detector

Abstract

Access this article

Similar content being viewed by others

An Inter-Comparative Survey on State-of-the-Art Detectors—R-CNN, YOLO, and SSD

Tracker Evaluation for Small Object Tracking

Comparative Studies with Random Datasets Using Enhanced Faster R-CNN, Mask R-CNN, and Single Shot Detector

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Small-objectness sensitive detection based on shifted single shot detector

Abstract

Access this article

Similar content being viewed by others

An Inter-Comparative Survey on State-of-the-Art Detectors—R-CNN, YOLO, and SSD

Tracker Evaluation for Small Object Tracking

Comparative Studies with Random Datasets Using Enhanced Faster R-CNN, Mask R-CNN, and Single Shot Detector

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation