Abstract
This paper proposes a simple and effective single-shot detector model to detect and count cars in aerial images. The proposed model, called heatmap learner convolutional neural network (HLCNN), is used to predict the heatmap of target car instances. In order to learn the heatmap of the target cars, we have improved CNN architecture by adding three convolutional layers as adaptation layers instead of fully connected layers. The VGG-16 has been used as a backbone convolutional neural network in the proposed model. The proposed method successfully determines the number of cars and precisely detects the center of target cars. Experiments on the two different car datasets (PUCPR+ and CARPK) show the state-of-the-art counting and localizing performance of the proposed method in comparison with existing methods. Also, experiments have been conducted to examine the effect of data augmentation and batch normalization on the success of the proposed method. The code and data will be made available here [https://www.github.com/ekilic/Heatmap-Learner-CNN-for-Object-Counting].
Similar content being viewed by others
References
Aich S, Stavness I (2018a) Object counting with small datasets of large images. arXiv preprint arXiv:1805.11123
Aich S, Stavness I (2018b) Improving object counting with heatmap regulation. arXiv preprint arXiv:1803.05494
Ammar A, Koubaa A, Ahmed M, Saad A (2019) Aerial images processing for car detection using convolutional neural networks: comparison between faster R-CNN and yolov3. arXiv preprint arXiv:1910.07234
Arteta C, Lempitsky V, Alison NJ, Zisserman A (2014) Interactive object counting. In: David F, Tomas P, Bernt S, Tinne T (eds) Computer vision-ECCV. Springer International Publishing, Berlin
Cai Y, Du D, Zhang L, Wen L, Wang W, Wu Y, Lyu S (2019) Guided attention network for object detection and counting on drones. arXiv preprint arXiv:1909.11307
Cazzato D, Claudio C, Jose Luis S-L, Holger V, Marco L (2020) A survey of computer vision methods for 2d object detection from unmanned aerial vehicles. J Imaging 6(8):78
Chan AB, Liang Z-SJ, Vasconcelos N (2008) Privacy preserving crowd monitoring: counting people without people models or tracking. IEEE Conf Comput Vis Pattern Recognit. https://doi.org/10.1109/cvpr.2008.4587569
Chen K, Gong S, Xiang T, Loy CC (2013) Cumulative attribute space for age and crowd density estimation. In: 2013 IEEE conference on computer vision and pattern recognition CVPR
Chen H, Libao Z, Jie M, Jue Z (2019) Target heat-map network: an end-to-end deep network for target detection in remote sensing images. Neurocomputing 331:375–387. https://doi.org/10.1016/j.neucom.2018.11.044
Chen K, Loy CC, Gong S, Xiang T (2012) Feature mining for localised crowd counting. In: British machine vision conference BMVC12
Chen W, Qiao Y, Li Y (2020) Inception-SSD: an improved single shot detector for vehicle detection. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-020-02085-w
Chu X, Yang W, Ouyang W, Ma C, Yuille AL, Wang X (2017) Multi-context attention for human pose estimation. IEEE Conf Comput Vis Pattern Recognit (CVPR). https://doi.org/10.1109/cvpr.2017.601
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) ImageNet: a large-scale hierarchical image database. In: 2009 2013 IEEE conference on computer vision and pattern recognition CVPR09
Di Mauro D, Furnari A, Patanè G, Battiato S, Farinella GM (2019) Estimating the occupancy status of parking areas by counting cars and non-empty stalls. J Vis Commun Image Represent 62:234–244. https://doi.org/10.1016/j.jvcir.2019.05.015
dos Santos de Arruda M, Lucas PO, Plabiany RA, Diogo NG, José MJ, Ana P, Marques R, Matsubara ET, Zhipeng L, Jonathan L, Jonathan de Andrade S, Wesley NG (2021) Counting and locating high-density objects using convolutional neural network. arXiv preprint arXiv:2102.04366
Fan Z, Jiewei L, Gong M, Xie H, Goodman ED (2018) Automatic tobacco plant detection in UAV images via deep neural networks. IEEE J Sel Top Appl Earth Observ Remote Sens 11(3):876–887. https://doi.org/10.1109/jstars.2018.2793849
Fiaschi L, Nair R, Köthe U, Hamprecht FA (2012) Learning to count with regression forest and structured labels. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012), pp 2685–2688. ISBN 978-1-4673-2216-4
Girshick RB (2015) Fast R-CNN. arXiv preprint arXiv:1504.08083
Goldman E, Herzig R, Eisenschtat A, Ratzon O, Levi I, Goldberger J, Hassner T (2019) Precise detection in densely packed scenes. arXiv preprint arXiv:1904.00853
Hsieh M-R, Lin Y-L, Hsu WH (2017) Drone-based object counting by spatially regularized regional proposal network. arXiv preprint arXiv:1707.05972
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167
Kang D, Ma Z, Chen AB (2019) Beyond counting: Comparisons of density maps for crowd analysis tasks-counting, detection, and tracking. IEEE Trans Circuits Syst Video Technol 29(5):1408–1422 (ISSN 1558-2205)
Kilic E, Ozturk S (2019) A subclass supported convolutional neural network for object detection and localization in remote-sensing images. Int J Remote Sens 40(11):4193–4212. https://doi.org/10.1080/01431161.2018.1562260
Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), 2018, pp. 734–750
Lempitsky V, Zisserman A (2010) Learning to count objects in images. In: Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A (eds) Advances in neural information processing systems, vol 28. Curran Associates Inc., London, pp 1324–1332
Li W, Li H, Wu Q, Chen X, Ngan KN (2019) Simultaneously detecting and counting dense vehicles from drone images. IEEE Trans Ind Electron 66(12):9651–9662. https://doi.org/10.1109/tie.2019.2899548
Lin T-Y, Goyal P, Girshick RB, He K, Dollár P (2017) Focal loss for dense object detection. arXiv preprint arXiv:1708.02002
Lin T-Y, Goyal P, Girshick R, He K, Dollar P (2018) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/tpami.2018.2858826
Liu W, Anguelov D, Erhan D, Szegedy C, Reed SE, Fu C-Y, Berg AC (2016) SSD: single shot multibox detector. In: Computer vision – ECCV 2016. ECCV 2016. Lecture notes in computer science, vol 9905. Springer, Cham. https://doi.org/10.1007/978-3-319-46448-0_2
Mundhenk NT, Konjevod G, Sakla WA, Boakye K (2016) A large contextual dataset for classification, detection and counting of cars with deep learning. In: Computer vision – ECCV 2016. ECCV 2016. Lecture notes in computer science, vol 9907. Springer, Cham. https://doi.org/10.1007/978-3-319-46487-9_48
Neupane B, Horanont T, Hung ND (2019) Deep learning based banana plant detection and counting using high-resolution red-green-blue (RGB) images collected from unmanned aerial vehicle (UAV). PLOS One 14(10):e0223906. https://doi.org/10.1371/journal.pone.0223906
Nogueira V, Oliveira H, Augusto Silva J, Vieira T, Oliveira K (2019) Retailnet: a deep learning approach for people counting and hot spots detection in retail stores. In: 2019 32nd SIBGRAPI conference on graphics, patterns and images (SIBGRAPI). https://doi.org/10.1109/sibgrapi.2019.00029
Oquab M, Bottou L, Laptev I, Sivic J (2015) Is object localization for free? Weakly-supervised learning with convolutional neural networks. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 685–694
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, Desmaison A (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inf Process Syst 32:8026–37
Pfister T, Charles J, Zisserman A (2015) Flowing ConvNets for human pose estimation in videos. IEEE Int Conf Comput Vis (ICCV). https://doi.org/10.1109/iccv.2015.222
Razakarivony S, Jurie F (2016) Vehicle detection in aerial imagery: a small target detection benchmark. J Vis Commun Image Represent 34:187–203. https://doi.org/10.1016/j.jvcir.2015.11.002
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. IEEE Conf Comput Vis Pattern Recogni (CVPR). https://doi.org/10.1109/cvpr.2016.91
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767
Ren S, He K, Irshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems, vol 28. Curran Associates Inc., London, pp 91–99
Revathi T, Rajalaxmi TM (2019) Deep learning for people counting model. Adv Intell Syst Comput. https://doi.org/10.1007/978-981-15-0035-043
Saribas H, Hakan C, Sinem K (2018) Car detection in images taken from unmanned aerial vehicles. Signal Process Commun Appl Conf (SIU). https://doi.org/10.1109/siu.2018.8404201
Sarwar F, Griffin A, Periasamy P, Portas K, Law J (2018) Detecting and counting sheep with a convolutional neural network. IEEE Int Conf Adv Video Signal Based Surveill (AVSS). https://doi.org/10.1109/avss.2018.8639306
Shao W, Kawakami R, Yoshihashi R, You S, Kawase H, Naemura T (2019) Cattle detection and counting in UAV images based on convolutional neural networks. Int J Remote Sens 41(1):31–52. https://doi.org/10.1080/01431161.2019.1624858
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Stahl T, Pintea SL, van Gemert JC (2019) Divide and count: generic object counting by image divisions. IEEE Trans Image Process 28(2):1035–1044. https://doi.org/10.1109/tip.2018.2875353
Sun M, Yan W, Teng L, Jing L, Jun W (2017) Vehicle counting in crowded scenes with multi-channel and multi-task convolutional neural networks. J Vis Commun Image Represent 49:412–419. https://doi.org/10.1016/j.jvcir.2017.10.002
Wang J, Liu C, Tian F, Zheng L (2019) Research on automatic target detection and recognition based on deep learning. J Vis Commun Image Represent 60:44–50. https://doi.org/10.1016/j.jvcir.2019.01.017
Wu Y, Yinpeng C, Lu Y, Zicheng L, Lijuan W, Hongzhi L, Yun F (2019) Rethinking classification and localization in R-CNN. arXiv preprint arXiv:1409.1556
Xie W, Alison JN, Andrew Z (2016) Microscopy cell counting and detection with fully convolutional regression networks. Comput Methods Biomech Biomed Eng 6(3):283–292. https://doi.org/10.1080/21681163.2016.1149104
Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853
Yang Z, Liu S, Hu H, Wang L, Lin S (2019) Reppoints: point set representation for object detection. arXiv preprint arXiv:1904.11490
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. IEEE Conf Comput Vis Pattern Recognit (CVPR). https://doi.org/10.1109/cvpr.2016.319
Zhou Y, Qixiang Y, Qiang Q, Jianbin J (2017) Oriented response networks. arXiv preprint arXiv:1701.01833
Zhou X, Wang D, Krähenbühl P (2019a) Objects as points. arXiv preprint arXiv:1904.07850
Zhou X, Wang D, Krähenbühl P (2019b) Bottom-up object detection by grouping extreme and center points. arXiv preprint arXiv:1901.08043
Zou Z, Zhenwei S, Yuhong G, Jieping Y (2019) Object detection in 20 years: a survey. arXiv preprint arXiv:1905.05055
Acknowledgements
This work is supported by Erciyes University, the Department of Research Projects under Contract FDK-2018-8624.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kilic, E., Ozturk, S. An accurate car counting in aerial images based on convolutional neural networks. J Ambient Intell Human Comput 14, 1259–1268 (2023). https://doi.org/10.1007/s12652-021-03377-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-021-03377-5