Skip to main content
Log in

A vehicle detection scheme based on two-dimensional HOG features in the DFT and DCT domains

  • Published:
Multidimensional Systems and Signal Processing Aims and scope Submit manuscript

Abstract

Histogram of oriented gradients (HOG) are often used as features for object detection in images, since they are robust to changes in illumination and environmental conditions. However, these features are not invariant to changes in the resolution of input images. A 2D representation of these features, referred to as 2DHOG features, has been used since it preserves the relations among the neighboring pixels or cells. In this paper, a new vehicle detection scheme using transform-domain 2DHOG features is proposed. The method is based on extracting the 2DHOG features from the input image and applying to it 2D discrete Fourier or cosine transform. This is followed by a truncation process through which only the low frequency coefficients, referred to as the transform-domain 2DHOG (TD2DHOG) features, are retained. It is shown that the TD2DHOG features obtained from an image at the original resolution and a downsampled version from the same image are approximately the same within a multiplicative factor. This property is then utilized in our scheme for the detection of vehicles of various resolutions using a single classifier rather than multiple resolution-specific classifiers. Experimental results show that the use of the single classifier in the proposed detection scheme reduces drastically the training and storage cost over the use of a classifier pyramid, yet providing a detection accuracy similar to that obtained using TD2DHOG features with a classifier pyramid. Furthermore, the proposed method provides a detection accuracy that is similar or even better than that provided by the state-of-the-art techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. The vectorization function is defined as Mat2Vec: \(\mathbb {R}^{\mu \times \nu } \rightarrow \mathbb {R}^{\rho }\), where \(\rho = \mu \nu \) is the dimension of the vector, and \((\mu \times \nu )\) is the order of the input matrix. The inverse of the vectorization function is defined as Vec2Mat: \( \mathbb {R}^{\rho } \rightarrow \mathbb {R}^{\mu \times \nu }\).

  2. The toolbox (Dollár 2016) has been used to calculate the 2DHOG.

  3. The MATLAB function lsqcurvefit is used, http://www.mathworks.com/help/optim/ug/lsqcurvefit.html.

  4. Using modern computer of 2.9GHz CPU, and 8G RAM.

References

  • Agarwal, S., Awan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 26(11), 1475–1490. http://cogcomp.org/page/resource_view/13/. Accessed November 1, 2018.

  • Ahmed, N., Natarajan, T., & Rao, K. (1974). Discrete cosine transform. IEEE Transactions on Computers, C–23(1), 90–93.

    Article  MathSciNet  MATH  Google Scholar 

  • Appel, R., Fuchs, T., Dollár, P., & Perona, P. (2013). Quickly boosting decision trees—Pruning underachieving features early. In Proceedings of the international conference on machine learning (ICML) (pp. 594–602).

  • Benenson, R., Mathias, M., Timofte, R., & Gool, L. V. (2012). Pedestrian detection at 100 frames per second. In Proceedings IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2903–2910).

  • Bi, G., & Mitra, S. (2011). Sampling rate conversion in the frequency domain [dsp tips and tricks]. IEEE Signal Processing Magazine, 28(3), 140–144.

    Article  Google Scholar 

  • Bileschi, S. (2006). StreetScenes: Towards scene understanding in still images. Ph.D. thesis, Massachusetts Institute of Technology. CBCL dataset link: http://cbcl.mit.edu/software-datasets/streetscenes. Accessed November 1, 2018.

  • Buch, N., Velastin, S. A., & Orwell, J. (2011). A review of computer vision techniques for the analysis of urban traffic. IEEE Transactions on Intelligent Transportation Systems (ITS), 12(3), 920–939.

    Article  Google Scholar 

  • Dalal, N. (2006). Finding people in images and videos. Ph.D. thesis, Institut National Polytechnique de Grenoble.

  • Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings IEEE conference on computer vision and pattern recognition (CVPR) (vol. 1, pp. 886–893).

  • Dollár, P. (2016). Piotr’s Image and Video Matlab Toolbox (PMT). http://vision.ucsd.edu/~pdollar/toolbox/doc/index.html. Retrieved, November 1, 2018.

  • Dollár, P., Appel, R., Belongie, S., & Perona, P. (2014). Fast feature pyramids for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 36(8), 1532–1545.

    Article  Google Scholar 

  • Dollár, P., Belongie, S., & Perona, P. (2010). The fastest pedestrian detector in the west. In Proceedings of the British machine vision conference (BMVC) (pp. 68.1–68.11).

  • Dollár, P., Tu, Z., Perona, P., & Belongie, S. (2009). Integral channel features. In Proceedings of the British machine vision conference (BMVC) (pp. 91.1–91.11).

  • Dollár, P., Wojek, C., Schiele, B., & Perona, P. (2012). Pedestrian detection: An evaluation of the state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 34(4), 743–761.

    Article  Google Scholar 

  • Dubout, C., & Fleuret, F. (2012). Exact acceleration of linear object detectors. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 301–311).

  • Everingham, M., Gool, L. V., Williams, C. K., Winn, J., & Zisserman, A. (2010). The Pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.

    Article  Google Scholar 

  • Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2016). The Pascal visual object classes challenge 2007 (VOC2007) results. http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html. Retrieved, November 1, 2018.

  • Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 32(9), 1627–1645.

    Article  Google Scholar 

  • Gall, J., & Lempitsky, V. (2009). Class-specific hough forests for object detection. In Proceedings IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1022–1029).

  • Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings conference on computer vision and pattern recognition (CVPR) (pp. 3354–3361).

  • Gepperth, A., Rebhan, S., Hasler, S., & Fritsch, J. (2011). Biased competition in visual processing hierarchies: A learning approach using multiple cues. Cognitive Computation, 3(1), 146–166.

    Article  Google Scholar 

  • Huang, J., & Mumford, D. (1999). Statistics of natural images and models. In Proceedings IEEE conference on computer vision and pattern recognition (CVPR) (vol. 1, pp. 541–547).

  • Kuo, C. H., & Nevatia, R. (2009). Robust multi-view car detection using unsupervised sub-categorization. In Proceedings of the IEEE workshop on applications of computer vision (WACV) (pp. 1–8).

  • Lampert, C. H., Blaschko, M., & Hofmann, T. (2008). Beyond sliding windows: Object localization by efficient subwindow search. In Proceedings IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).

  • Leibe, B., Leonardis, A., & Schiele, B. (2008). Robust object detection with interleaved categorization and segmentation. International Journal of Computer Vision, 77(1), 259–289.

    Article  Google Scholar 

  • Li, B., Wu, T., & Zhu, S. C. (2014). Integrating context and occlusion for car detection by hierarchical and-or model. In Proceedings of the European conference on computer vision (ECCV) (pp. 652–667).

  • Maji, S., Berg, A. C., & Malik, J. (2008). Classification using intersection kernel support vector machines is efficient. In Proceedings IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).

  • Maji, S., Berg, A. C., & Malik, J. (2013). Efficient classification for additive kernel SVMs. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 35(1), 66–77.

    Article  Google Scholar 

  • Mutch, J., & Lowe, D. G. (2008). Object class recognition and localization using sparse features with limited receptive fields. International Journal of Computer Vision, 80(1), 45–57.

    Article  Google Scholar 

  • Naiel, M. A., Ahmad, M. O., & Swamy, M. (2015). Vehicle detection using approximation of feature pyramids in the DFT domain. In Proceedings of the international conference on image analysis and recognition. (ICIAR) (pp. 429–436). Springer.

  • Naiel, M. A., Ahmad, M. O., & Swamy, M. N. S. (2014). Vehicle detection using TD2DHOG features. In Proceedings of New circuits and systems conference (NewCAS) (pp. 389–392).

  • Ohn-Bar, E., & Trivedi, M. M. (2015). Learning to detect vehicles by clustering appearance patterns. IEEE Transactions on Intelligent Transportation Systems (ITS), 16(5), 2511–2521.

    Article  Google Scholar 

  • Papageorgiou, C. P., Oren, M., & Poggio, T. (1998). A general framework for object detection. In Proceedings of the sixth IEEE international conference on computer vision (ICCV) (pp. 555–562).

  • Pepikj, B., Stark, M., Gehler, P., & Schiele, B. (2013). Occlusion patterns for object class detection. In Proceedings IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3286–3293).

  • Ruderman, D. L. (1994). The statistics of natural images. Network: Computation in Neural Systems, 5, 517–548.

    Article  MATH  Google Scholar 

  • Sivaraman, S., & Trivedi, M. (2010). A general active-learning framework for on-road vehicle recognition and tracking. The IEEE intelligent transportation systems (ITS), 11(2), 267–276.

    Article  Google Scholar 

  • Sivaraman, S., & Trivedi, M. (2013a). Looking at vehicles on the road: A survey of vision-based vehicle detection, tracking, and behavior analysis. The IEEE intelligent transportation systems (ITS), 14(4), 1773–1795.

    Article  Google Scholar 

  • Sivaraman, S., & Trivedi, M. (2013b). Vehicle detection by independent parts for urban driver assistance. The IEEE intelligent transportation systems (ITS), 14(4), 1597–1608.

    Article  Google Scholar 

  • Smith, J. O. (2007). Mathematics of the discrete Fourier transform (DFT) (2nd ed.). W3K Publishing.

  • Takeuchi, A., Mita, S., & McAllester, D. (2010). On-road vehicle tracking using deformable object model and particle filter with integrated likelihoods. In Proceedings of the IEEE intelligent vehicles symposium (IV) (pp. 1014–1021).

  • Wang, C., Fang, Y., Zhao, H., Guo, C., Mita, S., & Zha, H. (2016). Probabilistic inference for occluded and multiview on-road vehicle detection. IEEE Transactions on Intelligent Transportation Systems (ITS) 17(1).

  • Wang, X., Han, T. X., & Yan, S. (2009). An HOG-LBP human detector with partial occlusion handling. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 32–39).

  • Wang, X., Yang, M., Zhu, S., & Lin, Y. (2015). Regionlets for generic object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 37(10), 2071–2084.

    Article  Google Scholar 

  • Wu, B., & Nevatia, R. (2007). Cluster boosted tree classifier for multi-view, multi-pose object detection. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 1–8).

  • Wu, B. F., Kao, C. C., Jen, C. L., Li, Y. F., Chen, Y. H., & Juang, J. H. (2014). A relative-discriminative-histogram-of-oriented-gradients-based particle filter approach to vehicle occlusion handling and tracking. IEEE Transactions on Industrial Electronics, 61, 4228–4237.

    Article  Google Scholar 

  • Wu, J., Liu, N., Geyer, C., & Rehg, J. (2013). \(\text{ C }^\text{4 }\): A real-time object detection framework. IEEE Transactions on Image Processing, 22(10), 4096–4107.

    Article  MathSciNet  MATH  Google Scholar 

  • Xiang, Y., Choi, W., Lin, Y., & Savarese, S. (2015). Data-driven 3D voxel patterns for object category recognition. In Proceedings IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1903–1911).

  • Yang, J., Zhang, D., Frangi, A. F., & Yang, J. Y. (2004). Two dimensional PCA: A new approach to appearance-based face representation and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 26(1), 131–137.

    Article  Google Scholar 

Download references

Acknowledgements

M. A. Naiel would like to acknowledge the support from Concordia University to conduct this research. This work is supported by research grants from the Natural Sciences and Engineering Research Council (NSERC) of Canada and the Regroupement Stratégeique en Microsystémes du Québec (ReSMiQ) awarded to M. O. Ahmad and M. N. S. Swamy.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Omair Ahmad.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Derivation of Equation (21)

Appendix: Derivation of Equation (21)

The 2DDCT for a grayscale image in the spatial domain, \(x\in \mathbb {R}^2\), is given by

$$\begin{aligned} X_{N,M}[u,v]= & {} \hat{\varGamma }_{N}[u]\hat{\varGamma }_{M}[v]\sum _{m=0}^{M-1}\sum _{n=0}^{N-1}x[n,m] \cos \left( \frac{\pi (2n+1)u}{2N}\right) \nonumber \\&\times \cos \left( \frac{\pi (2m+1)v}{2 M}\right) \end{aligned}$$
(1)

where \(0\le u \le N-1\), \(0\le v \le M-1\), \(\hat{\varGamma }_{N}[k]=\sqrt{1/N}\) for \(k=0\) and \(\hat{\varGamma }_{N}[k]=\sqrt{2/N}\) for \(0< k \le N-1\). Let N and M be even multiples of \(K_1\), and \(K_2\), respectively, where \(K_1\) and \(K_2\) are the downsampling factors in the y and the x directions, respectively. Let x[nm] be a bandlimited sequence, and the sequence \(y\in \mathbb {R}^2\), of size \((2N \times 2M)\), be defined as

$$\begin{aligned} \small {\begin{aligned} y[n,m]&={\left\{ \begin{array}{ll}x[n,m], &{} 0\le n \le N-1, 0\le m \le M-1\\ 0, &{} \text {otherwise} \\ \end{array}\right. } \end{aligned}} \end{aligned}$$
(2)

The N \(\times \) M-point 2DDCT can be computed by 2N \(\times \) 2M-point 2DDFT for a sequence, y[nm], as follows. First, the 2DDFT is employed on y[nm] in order to obtain \(Y_{2N,2M}\). Similar to the 1DDCT case, the relation between the signal in the 2DDCT domain \(X_{N,M}[u,v]\), and \(Y_{2N,2M}[u,v]\) can be expressed as

$$\begin{aligned} X_{N,M}[u,v]=\hat{\varGamma }_{N}[u]\hat{\varGamma }_{M}[v] \text {Re} \left( Y_{2N,2M}[u,v] e^{-j\left( \frac{\pi u}{2N}+\frac{\pi v}{2M}\right) }\right) \end{aligned}$$
(3)

where \(0\le u \le N-1\), \(0\le v \le M-1\). Let \(c_1\), \(c_2\) denote the maximum frequencies retained by the truncation operator, where \(c_1 < \hat{N}\), \(c_2 <\hat{M}\), \(\hat{N}=N/K_1\), and \(\hat{M}= M/K_2\). Assume \(Y_{2N,2M}\) is bandlimited to the maximum frequencies \((\hat{N}, \hat{M})\). Then, the downsampled signal in the 2DDCT domain, \(\hat{X}_{\hat{N},\hat{M}}\), can be obtained as

$$\begin{aligned} \hat{X}_{\hat{N},\hat{M}}[u,v]&=\frac{1}{K_1 K_2} \hat{\varGamma }_{\hat{N}}[u]\hat{\varGamma }_{\hat{M}}[v] \text {Re}\left( Y_{2N,2M}[u,v] e^{-j\left( \frac{\pi u}{2N}+\frac{\pi v}{2M}\right) }\right) \end{aligned}$$
(4)
$$\begin{aligned}&= \frac{\hat{\varGamma }_{\hat{N}}[u]\hat{\varGamma }_{\hat{M}}[v]}{K_1 K_2\hat{\varGamma }_{{N}}[u]\hat{\varGamma }_{{M}}[v]} \hat{\varGamma }_{{N}}[u]\hat{\varGamma }_{{M}}[v] \text {Re}(Y_{2N,2M}[u,v]\nonumber \\&\quad \times e^{-j\left( \frac{\pi u}{2N}+\frac{\pi v}{2M}\right) }) \end{aligned}$$
(5)
$$\begin{aligned}&=\frac{\sqrt{1/\hat{N}}{\sqrt{1/\hat{M}}}}{K_1 K_2 \sqrt{1/N} \sqrt{1/M}} X_{N,M}[u,v] \end{aligned}$$
(6)
$$\begin{aligned}&= \frac{1}{\sqrt{K_1K_2}} X_{N,M}[u,v] \end{aligned}$$
(7)

where \(0\le u \le c_1-1\) and \(0\le v \le c_2-1\). Thus, the relation between the 2DDCT coefficients of the original image and that of the downsampled version is given by

$$\begin{aligned} X_{N,M}[u,v]= \sqrt{K_1K_2} \hat{X}_{\hat{N},\hat{M}}[u,v] \end{aligned}$$
(8)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Naiel, M.A., Ahmad, M.O. & Swamy, M.N.S. A vehicle detection scheme based on two-dimensional HOG features in the DFT and DCT domains. Multidim Syst Sign Process 30, 1697–1729 (2019). https://doi.org/10.1007/s11045-018-0621-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11045-018-0621-1

Keywords

Navigation