Crowd-aware Thresholded Loss for Object Detection in Wide Area Motion Imagery

Hatipoglu, Poyraz Umut; Iyigun, Cem; Kalkan, Sinan

doi:10.1007/s41064-023-00253-z

Crowd-aware Thresholded Loss for Object Detection in Wide Area Motion Imagery

Original Article
Published: 24 July 2023

Volume 91, pages 339–364, (2023)
Cite this article

PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science Aims and scope Submit manuscript

173 Accesses
Explore all metrics

Abstract

Detecting objects in Wide Area Motion Imagery (WAMI), an essential task for many practical applications, is particularly challenging in crowded scenes, such as areas with heavy traffic, since pixel resolutions of objects and ground sampling distance are highly compromised, and different factors disrupt visual signals. To address this challenge, we design a framework that combines preprocessing operations and deep detectors. To train deep networks for detection in WAMI for improved performance in especially crowded areas, we propose a novel crowd-aware thresholded loss (CATLoss) function. Moreover, we introduce a hard sampling mining method to strengthen the discriminative ability of the proposed solution. Additionally, we extend prior networks used in the literature using novel spatio-temporal cascaded architectures to incorporate more contextual information without introducing additional parameters. Overall, our approach is causal, more generalizable, and more robust even in reduced spatial sizes. On the WPAFB-2009 dataset, we show that our solution performs better than or on par with state-of-the-art without introducing any computational complexity during inference. The code and trained models will be released at (https://github.com/poyrazhatipoglu/CATLoss).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Robust crowd counting based on refined density map

Article 02 December 2019

Slime Mold optimization with hybrid deep learning enabled crowd-counting approach in video surveillance

Article 26 October 2023

SD-Net: Understanding overcrowded scenes in real-time via an efficient dilated convolutional neural network

Article 28 September 2020

Availability of Data and Material

The data sources utilized in this study are already publicly available through their respective providers.

Code Availability

The code and trained models will be released at https://github.com/poyrazhatipoglu/CATLoss.

References

Aeschliman C, Park J, Kak AC (2014) Tracking vehicles through shadows and occlusions in wide-area aerial video. IEEE Trans Aerosp Electron Syst 50(1):429–444
Article Google Scholar
AFRL U (2009) Wright-patterson air force base (wpafb) dataset
Agarwal S, Furukawa Y, Snavely N, Simon I, Curless B, Seitz SM, Szeliski R (2011) Building Rome in a day. Commun ACM 54(10):105–112
Article Google Scholar
Alcantarilla PF, Solutions T (2011) Fast explicit diffusion for accelerated features in nonlinear scale spaces. IEEE Trans Patt Anal Mach Intell 34(7):1281–1298
Google Scholar
Andrew AM (2001) Multiple view geometry in computer vision. Kybernetes
Ao W, Fu Y, Hou X, Xu F (2019) Needles in a haystack: tracking city-scale moving vehicles from continuously moving satellite. IEEE Trans Image Process 29:1944–1957
Article Google Scholar
Basharat A, Turek M, Xu Y, Atkins C, Stoup D, Fieldhouse K, Tunison P, Hoogs A (2014) Real-time multi-target tracking at 210 megapixels/second in wide area motion imagery. In: IEEE winter conference on applications of computer vision, IEEE, pp 839–846
Biewald L (2020) Experiment tracking with weights and biases. Software available from wandb com 2
Brown LG (1992) A survey of image registration techniques. ACM Comput Surv (CSUR) 24(4):325–376
Article Google Scholar
Brutzer S, Höferlin B, Heidemann G (2011) Evaluation of background subtraction techniques for video surveillance. In: CVPR 2011. IEEE, pp 1937–1944
Bürkle A, Essendorfer B (2010) Maritime surveillance with integrated systems. In: 2010 International WaterSide Security Conference. IEEE, pp 1–8
Canepa A, Ragusa E, Zunino R, Gastaldo P (2021) T-rexnet-a hardware-aware neural network for real-time detection of small moving objects. Sensors 21(4):1252
Article Google Scholar
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision. Springer, pp 213–229
Chang HC, Lai SH, Lu KR (2006) A robust real-time video stabilization algorithm. J Vis Commun Image Represent 17(3):659–673
Article Google Scholar
Chen H, Zhang L, Ma J, Zhang J (2019) Target heat-map network: an end-to-end deep network for target detection in remote sensing images. Neurocomputing 331:375–387
Article Google Scholar
Chu X, Yang W, Ouyang W, Ma C, Yuille AL, Wang X (2017) Multi-context attention for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1831–1840
Dawn S, Saxena V, Sharma B (2010) Remote sensing image registration techniques: a survey. In: International conference on image and signal processing. Springer, pp 103–112
Dewancker I, McCourt M, Clark S (2016) Bayesian optimization for machine learning: a practical guidebook. arXiv preprint arXiv:1612.04858
Doherty P, Rudol P (2007) A uav search and rescue scenario with human body detection and geolocalization. In: Australasian joint conference on Artificial Intelligence. Springer, pp 1–13
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Du D, Qi Y, Yu H, Yang Y, Duan K, Li G, Zhang W, Huang Q, Tian Q (2018) The unmanned aerial vehicle benchmark: object detection and tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 370–386
Feather R, Davis JW (2011) Activity analysis in wide-area aerial surveillance video. Tech. rep., Ohio State University Columbus United States
Fehlmann S, Pontecorvo C, Booth DM, Janney P, Christie R, Redding NJ, Royce M, Fiebig M (2014) Fusion of multiple sensor data to recognise moving objects in wide area motion imagery. In: 2014 international conference on digital image computing: techniques and applications (DICTA), pp 1–8. https://doi.org/10.1109/DICTA.2014.7008110
Force UA (2007) Columbus large image format dataset 2007
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge
Google Scholar
Hartung C, Spraul R, Krüger W (2018) Improvement of persistent tracking in wide area motion imagery by CNN-based motion detections. In: Image and signal processing for remote sensing XXIV. SPIE, vol 10789, pp 249–258
Hatipoğlu P, Albayrak R, Alatan AA (2020) Object detection under moving cloud shadows in WAMI. ISPRS Ann Photogramm Remote Sens Spatial Inf Sci 2:837–844
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. PMLR, pp 448–456
Keck M, Galup L, Stauffer C (2013) Real-time tracking of low-resolution vehicles for wide-area persistent surveillance. In: 2013 IEEE workshop on applications of computer vision (WACV). IEEE, pp 441–448
Kent P, Maskell S, Payne O, Richardson S, Scarff L (2012) Robust background subtraction for automated detection and tracking of targets in wide area motion imagery. In: Optics and photonics for counterterrorism, crime fighting, and defence VIII, SPIE vol 8546, pp 208–219
Krausman JA, Miller DA (2015) The 12m$^{{\rm TM}}$ tethered aerostat system: rapid tactical deployment for surveillance missions. In: 22nd AIAA lighter-than-air systems technology conference, p 3351
Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Progr Artif Intell 5(4):221–232
Article Google Scholar
LaLonde R, Zhang D, Shah M (2018) Clusternet: Detecting small objects in large scenes by exploiting spatio-temporal information. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4003–4012
Lei X, Pan H, Huang X (2019) A dilated CNN model for image classification. IEEE Access 7:124087–124095
Article Google Scholar
Li L, Jamieson K, DeSalvo G, Rostamizadeh A, Talwalkar A (2017) Hyperband: a novel bandit-based approach to hyperparameter optimization. J Mach Learn Res 18(1):6765–6816
Google Scholar
Liang P, Ling H, Blasch E, Seetharaman G, Shen D, Chen G (2013) Vehicle detection in wide area aerial surveillance using temporal context. In: Proceedings of the 16th international conference on information fusion. IEEE, pp 181–188
Lin Y, Medioni G (2007) Map-enhanced uav image sequence registration and synchronization of multiple image sequences. In: 2007 IEEE conference on computer vision and pattern recognition. IEEE, pp 1–7
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Loshchilov I, Hutter F (2018) Fixing weight decay regularization in adam.https://openreview.net/forum?id=rk6qdGgCZ
Luo W, Li Y, Urtasun R, Zemel R (2016) Understanding the effective receptive field in deep convolutional neural networks. Adv Neural Inf Process Syst 29
Motorcu H, Ates HF, Ugurdag HF, Gunturk BK (2021) Hm-net: a regression network for object center detection and tracking on wide area motion imagery. IEEE Access 10:1346–1359
Article Google Scholar
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Icml
Negin F, Tabejamaat M, Fraisse R, Bremond F (2022) Transforming temporal embeddings to keypoint heatmaps for detection of tiny vehicles in wide area motion imagery (wami) sequences. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1432–1441
Ozyurt EO, Gunsel B (2018) Wami object tracking using l 1 tracker integrated with a deep detector. In: 2018 25th IEEE international conference on image processing (ICIP). IEEE, pp 2690–2694
Palaniappan K, Rao RM, Seetharaman G (2011) Wide-area persistent airborne video: architecture and challenges. In: Distributed video sensor networks. Springer, pp 349–371
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, et al. (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inf Process Syst 32
Perera AA, Srinivas C, Hoogs A, Brooksby G, Hu W (2006) Multi-object tracking through simultaneous long occlusions and split-merge conditions. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06). IEEE, vol 1, pp 666–673
Pfister T, Charles J, Zisserman A (2015) Flowing convnets for human pose estimation in videos. In: Proceedings of the IEEE international conference on computer vision, pp 1913–1921
Pflugfelder R, Weissenfeld A, Wagner J (2020) On learning vehicle detection in satellite video. arXiv preprint arXiv:2001.10900
Pi Y, Nath ND, Behzadan AH (2020) Convolutional neural networks for object detection in aerial imagery for disaster response and recovery. Adv Eng Inform 43:101009
Article Google Scholar
Pollard T, Antone M (2012) Detecting and tracking all moving objects in wide-area aerial video. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops. IEEE, pp 15–22
Reilly V, Idrees H, Shah M (2010) Detection and tracking of large number of targets in wide area surveillance. In: European conference on computer vision. Springer, pp 186–199
Saleemi I, Shah M (2013) Multiframe many-many point correspondence for vehicle tracking in high density wide area aerial videos. Int J Comput Vis 104(2):198–219
Article Google Scholar
Shi X, Ling H, Blasch E, Hu W (2012) Context-driven moving vehicle detection in wide area motion imagery. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012). IEEE, pp 2512–2515
Smith LN, Topin N (2019) Super-convergence: very fast training of neural networks using large learning rates. Artificial intelligence and machine learning for multi-domain operations applications. SPIE, vol 11006, pp 369–386
Sodemann AA, Ross MP, Borghetti BJ (2012) A review of anomaly detection in automated surveillance. IEEE Trans Syst Man Cybern Part C (Applications and Reviews) 42(6):1257–1272
Article Google Scholar
Sommer LW, Teutsch M, Schuchert T, Beyerer J (2016) A survey on moving object detection for wide area motion imagery. In: 2016 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1–9
Spruyt V, Ledda A, Philips W (2013) Sparse optical flow regularization for real-time visual tracking. In: 2013 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6
Teutsch M, Grinberg M (2016) Robust detection of moving vehicles in wide area motion imagery. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 27–35
Van Brummelen G (2012) Heavenly mathematics: the forgotten art of spherical trigonometry. Princeton University Press, Princeton
Book Google Scholar
Vella E, Azim A, Gaetjens HX, Repasky B, Payne T (2019) Improved detection for wami using background contextual information. In: 2019 digital image computing: techniques and applications (DICTA). IEEE, pp 1–9
Xiao J, Cheng H, Sawhney H, Han F (2010) Vehicle detection and tracking in wide field-of-view aerial video. In: 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 679–684
Yong H, Huang J, Meng D, Hua X, Zhang L (2020) Momentum batch normalization for deep learning with small batch size. In: European conference on computer vision. Springer, pp 224–240
Zheng E, Wu C (2015) Structure from motion using structure-less resection. In: Proceedings of the IEEE international conference on computer vision, pp 2075–2083
Zhou Y, Maskell S (2019) Detecting and tracking small moving objects in wide area motion imagery (wami) using convolutional neural networks (cnns). In: 2019 22th international conference on information fusion (FUSION). IEEE, pp 1–8
Zivkovic Z (2004) Improved adaptive gaussian mixture model for background subtraction. In: Proceedings of the 17th international conference on pattern recognition, 2004. ICPR 2004. IEEE, vol 2, pp 28–31

Download references

Funding

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Author information

Authors and Affiliations

Industrial Engineering, Middle East Technical University, 06800, Cankaya, Ankara, Turkey
Poyraz Umut Hatipoglu & Cem Iyigun
Computer Engineering, Middle East Technical University, 06800, Cankaya, Ankara, Turkey
Sinan Kalkan

Authors

Poyraz Umut Hatipoglu
View author publications
You can also search for this author in PubMed Google Scholar
Cem Iyigun
View author publications
You can also search for this author in PubMed Google Scholar
Sinan Kalkan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: [PUH]; Methodology: [PUH]; Software: [PUH]; Validation: [PUH]; Investigation: [PUH]; Writing—Original Draft: [PUH]; Visualization: [PUH]; Supervision: [CI, SK]; Validation [CI, SK]; Writing—Review and Editing [CI, SK]

Corresponding author

Correspondence to Poyraz Umut Hatipoglu.

Ethics declarations

Conflict of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix

1.1 A.1 Visual Outputs

See Fig. 11.

1.2 A.2 Haversine Distance

The distance traveled between two consecutive frames and the speeds of targets can be calculated using the latitude ($\varphi$), and longitude ($\lambda$) coordinates of targets, and the time elapsed between two consecutive frames. To calculate the approximate distances between two points (($\varphi _1$, $\lambda _1$), ($\varphi _2$, $\lambda _2$)) on Earth’s surface, Haversine distance (Van Brummelen 2012) formulated in Eq. (9) is used.

$$\begin{aligned} \small d = 2r \arcsin \left(\sqrt{\sin ^2{\left(\frac{\varphi _2 -\varphi _1}{2}\right)}+ \cos {\varphi _1} \cos {\varphi _2} \sin ^2{\left(\frac{\lambda _2 -\lambda _1}{2}\right)}}\right) \end{aligned}$$

(9)

where r is the approximate Earth’s radius.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hatipoglu, P.U., Iyigun, C. & Kalkan, S. Crowd-aware Thresholded Loss for Object Detection in Wide Area Motion Imagery. PFG 91, 339–364 (2023). https://doi.org/10.1007/s41064-023-00253-z

Download citation

Received: 01 May 2023
Accepted: 10 July 2023
Published: 24 July 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s41064-023-00253-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Crowd-aware Thresholded Loss for Object Detection in Wide Area Motion Imagery

Abstract

Access this article

Similar content being viewed by others

Robust crowd counting based on refined density map

Slime Mold optimization with hybrid deep learning enabled crowd-counting approach in video surveillance

SD-Net: Understanding overcrowded scenes in real-time via an efficient dilated convolutional neural network

Availability of Data and Material

Code Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest

Appendix

1.1 A.1 Visual Outputs

1.2 A.2 Haversine Distance

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Crowd-aware Thresholded Loss for Object Detection in Wide Area Motion Imagery

Abstract

Access this article

Similar content being viewed by others

Robust crowd counting based on refined density map

Slime Mold optimization with hybrid deep learning enabled crowd-counting approach in video surveillance

SD-Net: Understanding overcrowded scenes in real-time via an efficient dilated convolutional neural network

Availability of Data and Material

Code Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest

Appendix

Appendix

1.1 A.1 Visual Outputs

1.2 A.2 Haversine Distance

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation