Skip to main content
Log in

Crowd-aware Thresholded Loss for Object Detection in Wide Area Motion Imagery

  • Original Article
  • Published:
PFG – Journal of Photogrammetry, Remote Sensing and Geoinformation Science Aims and scope Submit manuscript

Abstract

Detecting objects in Wide Area Motion Imagery (WAMI), an essential task for many practical applications, is particularly challenging in crowded scenes, such as areas with heavy traffic, since pixel resolutions of objects and ground sampling distance are highly compromised, and different factors disrupt visual signals. To address this challenge, we design a framework that combines preprocessing operations and deep detectors. To train deep networks for detection in WAMI for improved performance in especially crowded areas, we propose a novel crowd-aware thresholded loss (CATLoss) function. Moreover, we introduce a hard sampling mining method to strengthen the discriminative ability of the proposed solution. Additionally, we extend prior networks used in the literature using novel spatio-temporal cascaded architectures to incorporate more contextual information without introducing additional parameters. Overall, our approach is causal, more generalizable, and more robust even in reduced spatial sizes. On the WPAFB-2009 dataset, we show that our solution performs better than or on par with state-of-the-art without introducing any computational complexity during inference. The code and trained models will be released at (https://github.com/poyrazhatipoglu/CATLoss).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Availability of Data and Material

The data sources utilized in this study are already publicly available through their respective providers.

Code Availability

The code and trained models will be released at https://github.com/poyrazhatipoglu/CATLoss.

References

  • Aeschliman C, Park J, Kak AC (2014) Tracking vehicles through shadows and occlusions in wide-area aerial video. IEEE Trans Aerosp Electron Syst 50(1):429–444

    Article  Google Scholar 

  • AFRL U (2009) Wright-patterson air force base (wpafb) dataset

  • Agarwal S, Furukawa Y, Snavely N, Simon I, Curless B, Seitz SM, Szeliski R (2011) Building Rome in a day. Commun ACM 54(10):105–112

    Article  Google Scholar 

  • Alcantarilla PF, Solutions T (2011) Fast explicit diffusion for accelerated features in nonlinear scale spaces. IEEE Trans Patt Anal Mach Intell 34(7):1281–1298

    Google Scholar 

  • Andrew AM (2001) Multiple view geometry in computer vision. Kybernetes

  • Ao W, Fu Y, Hou X, Xu F (2019) Needles in a haystack: tracking city-scale moving vehicles from continuously moving satellite. IEEE Trans Image Process 29:1944–1957

    Article  Google Scholar 

  • Basharat A, Turek M, Xu Y, Atkins C, Stoup D, Fieldhouse K, Tunison P, Hoogs A (2014) Real-time multi-target tracking at 210 megapixels/second in wide area motion imagery. In: IEEE winter conference on applications of computer vision, IEEE, pp 839–846

  • Biewald L (2020) Experiment tracking with weights and biases. Software available from wandb com 2

  • Brown LG (1992) A survey of image registration techniques. ACM Comput Surv (CSUR) 24(4):325–376

    Article  Google Scholar 

  • Brutzer S, Höferlin B, Heidemann G (2011) Evaluation of background subtraction techniques for video surveillance. In: CVPR 2011. IEEE, pp 1937–1944

  • Bürkle A, Essendorfer B (2010) Maritime surveillance with integrated systems. In: 2010 International WaterSide Security Conference. IEEE, pp 1–8

  • Canepa A, Ragusa E, Zunino R, Gastaldo P (2021) T-rexnet-a hardware-aware neural network for real-time detection of small moving objects. Sensors 21(4):1252

    Article  Google Scholar 

  • Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision. Springer, pp 213–229

  • Chang HC, Lai SH, Lu KR (2006) A robust real-time video stabilization algorithm. J Vis Commun Image Represent 17(3):659–673

    Article  Google Scholar 

  • Chen H, Zhang L, Ma J, Zhang J (2019) Target heat-map network: an end-to-end deep network for target detection in remote sensing images. Neurocomputing 331:375–387

    Article  Google Scholar 

  • Chu X, Yang W, Ouyang W, Ma C, Yuille AL, Wang X (2017) Multi-context attention for human pose estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1831–1840

  • Dawn S, Saxena V, Sharma B (2010) Remote sensing image registration techniques: a survey. In: International conference on image and signal processing. Springer, pp 103–112

  • Dewancker I, McCourt M, Clark S (2016) Bayesian optimization for machine learning: a practical guidebook. arXiv preprint arXiv:1612.04858

  • Doherty P, Rudol P (2007) A uav search and rescue scenario with human body detection and geolocalization. In: Australasian joint conference on Artificial Intelligence. Springer, pp 1–13

  • Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  • Du D, Qi Y, Yu H, Yang Y, Duan K, Li G, Zhang W, Huang Q, Tian Q (2018) The unmanned aerial vehicle benchmark: object detection and tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 370–386

  • Feather R, Davis JW (2011) Activity analysis in wide-area aerial surveillance video. Tech. rep., Ohio State University Columbus United States

  • Fehlmann S, Pontecorvo C, Booth DM, Janney P, Christie R, Redding NJ, Royce M, Fiebig M (2014) Fusion of multiple sensor data to recognise moving objects in wide area motion imagery. In: 2014 international conference on digital image computing: techniques and applications (DICTA), pp 1–8. https://doi.org/10.1109/DICTA.2014.7008110

  • Force UA (2007) Columbus large image format dataset 2007

  • Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge

    Google Scholar 

  • Hartung C, Spraul R, Krüger W (2018) Improvement of persistent tracking in wide area motion imagery by CNN-based motion detections. In: Image and signal processing for remote sensing XXIV. SPIE, vol 10789, pp 249–258

  • Hatipoğlu P, Albayrak R, Alatan AA (2020) Object detection under moving cloud shadows in WAMI. ISPRS Ann Photogramm Remote Sens Spatial Inf Sci 2:837–844

    Article  Google Scholar 

  • He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026–1034

  • Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. PMLR, pp 448–456

  • Keck M, Galup L, Stauffer C (2013) Real-time tracking of low-resolution vehicles for wide-area persistent surveillance. In: 2013 IEEE workshop on applications of computer vision (WACV). IEEE, pp 441–448

  • Kent P, Maskell S, Payne O, Richardson S, Scarff L (2012) Robust background subtraction for automated detection and tracking of targets in wide area motion imagery. In: Optics and photonics for counterterrorism, crime fighting, and defence VIII, SPIE vol 8546, pp 208–219

  • Krausman JA, Miller DA (2015) The 12m\(^{{\rm TM}}\) tethered aerostat system: rapid tactical deployment for surveillance missions. In: 22nd AIAA lighter-than-air systems technology conference, p 3351

  • Krawczyk B (2016) Learning from imbalanced data: open challenges and future directions. Progr Artif Intell 5(4):221–232

    Article  Google Scholar 

  • LaLonde R, Zhang D, Shah M (2018) Clusternet: Detecting small objects in large scenes by exploiting spatio-temporal information. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4003–4012

  • Lei X, Pan H, Huang X (2019) A dilated CNN model for image classification. IEEE Access 7:124087–124095

    Article  Google Scholar 

  • Li L, Jamieson K, DeSalvo G, Rostamizadeh A, Talwalkar A (2017) Hyperband: a novel bandit-based approach to hyperparameter optimization. J Mach Learn Res 18(1):6765–6816

    Google Scholar 

  • Liang P, Ling H, Blasch E, Seetharaman G, Shen D, Chen G (2013) Vehicle detection in wide area aerial surveillance using temporal context. In: Proceedings of the 16th international conference on information fusion. IEEE, pp 181–188

  • Lin Y, Medioni G (2007) Map-enhanced uav image sequence registration and synchronization of multiple image sequences. In: 2007 IEEE conference on computer vision and pattern recognition. IEEE, pp 1–7

  • Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988

  • Loshchilov I, Hutter F (2018) Fixing weight decay regularization in adam.https://openreview.net/forum?id=rk6qdGgCZ

  • Luo W, Li Y, Urtasun R, Zemel R (2016) Understanding the effective receptive field in deep convolutional neural networks. Adv Neural Inf Process Syst 29

  • Motorcu H, Ates HF, Ugurdag HF, Gunturk BK (2021) Hm-net: a regression network for object center detection and tracking on wide area motion imagery. IEEE Access 10:1346–1359

    Article  Google Scholar 

  • Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Icml

  • Negin F, Tabejamaat M, Fraisse R, Bremond F (2022) Transforming temporal embeddings to keypoint heatmaps for detection of tiny vehicles in wide area motion imagery (wami) sequences. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1432–1441

  • Ozyurt EO, Gunsel B (2018) Wami object tracking using l 1 tracker integrated with a deep detector. In: 2018 25th IEEE international conference on image processing (ICIP). IEEE, pp 2690–2694

  • Palaniappan K, Rao RM, Seetharaman G (2011) Wide-area persistent airborne video: architecture and challenges. In: Distributed video sensor networks. Springer, pp 349–371

  • Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, et al. (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inf Process Syst 32

  • Perera AA, Srinivas C, Hoogs A, Brooksby G, Hu W (2006) Multi-object tracking through simultaneous long occlusions and split-merge conditions. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06). IEEE, vol 1, pp 666–673

  • Pfister T, Charles J, Zisserman A (2015) Flowing convnets for human pose estimation in videos. In: Proceedings of the IEEE international conference on computer vision, pp 1913–1921

  • Pflugfelder R, Weissenfeld A, Wagner J (2020) On learning vehicle detection in satellite video. arXiv preprint arXiv:2001.10900

  • Pi Y, Nath ND, Behzadan AH (2020) Convolutional neural networks for object detection in aerial imagery for disaster response and recovery. Adv Eng Inform 43:101009

    Article  Google Scholar 

  • Pollard T, Antone M (2012) Detecting and tracking all moving objects in wide-area aerial video. In: 2012 IEEE computer society conference on computer vision and pattern recognition workshops. IEEE, pp 15–22

  • Reilly V, Idrees H, Shah M (2010) Detection and tracking of large number of targets in wide area surveillance. In: European conference on computer vision. Springer, pp 186–199

  • Saleemi I, Shah M (2013) Multiframe many-many point correspondence for vehicle tracking in high density wide area aerial videos. Int J Comput Vis 104(2):198–219

    Article  Google Scholar 

  • Shi X, Ling H, Blasch E, Hu W (2012) Context-driven moving vehicle detection in wide area motion imagery. In: Proceedings of the 21st international conference on pattern recognition (ICPR2012). IEEE, pp 2512–2515

  • Smith LN, Topin N (2019) Super-convergence: very fast training of neural networks using large learning rates. Artificial intelligence and machine learning for multi-domain operations applications. SPIE, vol 11006, pp 369–386

  • Sodemann AA, Ross MP, Borghetti BJ (2012) A review of anomaly detection in automated surveillance. IEEE Trans Syst Man Cybern Part C (Applications and Reviews) 42(6):1257–1272

    Article  Google Scholar 

  • Sommer LW, Teutsch M, Schuchert T, Beyerer J (2016) A survey on moving object detection for wide area motion imagery. In: 2016 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1–9

  • Spruyt V, Ledda A, Philips W (2013) Sparse optical flow regularization for real-time visual tracking. In: 2013 IEEE international conference on multimedia and expo (ICME). IEEE, pp 1–6

  • Teutsch M, Grinberg M (2016) Robust detection of moving vehicles in wide area motion imagery. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 27–35

  • Van Brummelen G (2012) Heavenly mathematics: the forgotten art of spherical trigonometry. Princeton University Press, Princeton

    Book  Google Scholar 

  • Vella E, Azim A, Gaetjens HX, Repasky B, Payne T (2019) Improved detection for wami using background contextual information. In: 2019 digital image computing: techniques and applications (DICTA). IEEE, pp 1–9

  • Xiao J, Cheng H, Sawhney H, Han F (2010) Vehicle detection and tracking in wide field-of-view aerial video. In: 2010 IEEE computer society conference on computer vision and pattern recognition. IEEE, pp 679–684

  • Yong H, Huang J, Meng D, Hua X, Zhang L (2020) Momentum batch normalization for deep learning with small batch size. In: European conference on computer vision. Springer, pp 224–240

  • Zheng E, Wu C (2015) Structure from motion using structure-less resection. In: Proceedings of the IEEE international conference on computer vision, pp 2075–2083

  • Zhou Y, Maskell S (2019) Detecting and tracking small moving objects in wide area motion imagery (wami) using convolutional neural networks (cnns). In: 2019 22th international conference on information fusion (FUSION). IEEE, pp 1–8

  • Zivkovic Z (2004) Improved adaptive gaussian mixture model for background subtraction. In: Proceedings of the 17th international conference on pattern recognition, 2004. ICPR 2004. IEEE, vol 2, pp 28–31

Download references

Funding

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: [PUH]; Methodology: [PUH]; Software: [PUH]; Validation: [PUH]; Investigation: [PUH]; Writing—Original Draft: [PUH]; Visualization: [PUH]; Supervision: [CI, SK]; Validation [CI, SK]; Writing—Review and Editing [CI, SK]

Corresponding author

Correspondence to Poyraz Umut Hatipoglu.

Ethics declarations

Conflict of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix

Appendix

1.1 A.1 Visual Outputs

See Fig. 11.

Fig. 11
figure 11

A frame sample taken demonstrating AOI 04 and processing outputs. Foreground pixel regions are marked with the highest intensity value on a black-white image, as shown in (b). Target regions are shown with green rectangles on top of the ground truth points in (c) and (d). The red dots in (c) and (d) are placed to point out patches including detected objects and the centre of the objects after the localization operation, respectively. As we mark the centre of any patch encompassing any part of the targets, multiple red dots are observed in close proximity to each other in (c). However, unlike the layout shown in (c), only the estimated centres of the targets are indicated in (d)

1.2 A.2 Haversine Distance

The distance traveled between two consecutive frames and the speeds of targets can be calculated using the latitude (\(\varphi\)), and longitude (\(\lambda\)) coordinates of targets, and the time elapsed between two consecutive frames. To calculate the approximate distances between two points ((\(\varphi _1\), \(\lambda _1\)), (\(\varphi _2\), \(\lambda _2\))) on Earth’s surface, Haversine distance (Van Brummelen 2012) formulated in Eq. (9) is used.

$$\begin{aligned} \small d = 2r \arcsin \left(\sqrt{\sin ^2{\left(\frac{\varphi _2 -\varphi _1}{2}\right)}+ \cos {\varphi _1} \cos {\varphi _2} \sin ^2{\left(\frac{\lambda _2 -\lambda _1}{2}\right)}}\right) \end{aligned}$$
(9)

where r is the approximate Earth’s radius.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hatipoglu, P.U., Iyigun, C. & Kalkan, S. Crowd-aware Thresholded Loss for Object Detection in Wide Area Motion Imagery. PFG 91, 339–364 (2023). https://doi.org/10.1007/s41064-023-00253-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41064-023-00253-z

Keywords

Navigation