A vehicle detection scheme based on two-dimensional HOG features in the DFT and DCT domains

Naiel, Mohamed A.; Ahmad, M. Omair; Swamy, M. N. S.

doi:10.1007/s11045-018-0621-1

A vehicle detection scheme based on two-dimensional HOG features in the DFT and DCT domains

Published: 09 November 2018

Volume 30, pages 1697–1729, (2019)
Cite this article

Multidimensional Systems and Signal Processing Aims and scope Submit manuscript

306 Accesses
5 Citations
Explore all metrics

Abstract

Histogram of oriented gradients (HOG) are often used as features for object detection in images, since they are robust to changes in illumination and environmental conditions. However, these features are not invariant to changes in the resolution of input images. A 2D representation of these features, referred to as 2DHOG features, has been used since it preserves the relations among the neighboring pixels or cells. In this paper, a new vehicle detection scheme using transform-domain 2DHOG features is proposed. The method is based on extracting the 2DHOG features from the input image and applying to it 2D discrete Fourier or cosine transform. This is followed by a truncation process through which only the low frequency coefficients, referred to as the transform-domain 2DHOG (TD2DHOG) features, are retained. It is shown that the TD2DHOG features obtained from an image at the original resolution and a downsampled version from the same image are approximately the same within a multiplicative factor. This property is then utilized in our scheme for the detection of vehicles of various resolutions using a single classifier rather than multiple resolution-specific classifiers. Experimental results show that the use of the single classifier in the proposed detection scheme reduces drastically the training and storage cost over the use of a classifier pyramid, yet providing a detection accuracy similar to that obtained using TD2DHOG features with a classifier pyramid. Furthermore, the proposed method provides a detection accuracy that is similar or even better than that provided by the state-of-the-art techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 11

Object Detection Based on Multiresolution CoHOG

Advanced Human Detection Using Fused Information of Depth and Intensity Images

Vehicle detection and recognition for intelligent traffic surveillance system

Article 01 March 2015

Notes

The vectorization function is defined as Mat2Vec: $\mathbb {R}^{\mu \times \nu } \rightarrow \mathbb {R}^{\rho }$, where $\rho = \mu \nu $ is the dimension of the vector, and $(\mu \times \nu )$ is the order of the input matrix. The inverse of the vectorization function is defined as Vec2Mat: $ \mathbb {R}^{\rho } \rightarrow \mathbb {R}^{\mu \times \nu }$.
The toolbox (Dollár 2016) has been used to calculate the 2DHOG.
The MATLAB function lsqcurvefit is used, http://www.mathworks.com/help/optim/ug/lsqcurvefit.html.
Using modern computer of 2.9GHz CPU, and 8G RAM.

References

Agarwal, S., Awan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 26(11), 1475–1490. http://cogcomp.org/page/resource_view/13/. Accessed November 1, 2018.
Ahmed, N., Natarajan, T., & Rao, K. (1974). Discrete cosine transform. IEEE Transactions on Computers, C–23(1), 90–93.
Article MathSciNet MATH Google Scholar
Appel, R., Fuchs, T., Dollár, P., & Perona, P. (2013). Quickly boosting decision trees—Pruning underachieving features early. In Proceedings of the international conference on machine learning (ICML) (pp. 594–602).
Benenson, R., Mathias, M., Timofte, R., & Gool, L. V. (2012). Pedestrian detection at 100 frames per second. In Proceedings IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2903–2910).
Bi, G., & Mitra, S. (2011). Sampling rate conversion in the frequency domain [dsp tips and tricks]. IEEE Signal Processing Magazine, 28(3), 140–144.
Article Google Scholar
Bileschi, S. (2006). StreetScenes: Towards scene understanding in still images. Ph.D. thesis, Massachusetts Institute of Technology. CBCL dataset link: http://cbcl.mit.edu/software-datasets/streetscenes. Accessed November 1, 2018.
Buch, N., Velastin, S. A., & Orwell, J. (2011). A review of computer vision techniques for the analysis of urban traffic. IEEE Transactions on Intelligent Transportation Systems (ITS), 12(3), 920–939.
Article Google Scholar
Dalal, N. (2006). Finding people in images and videos. Ph.D. thesis, Institut National Polytechnique de Grenoble.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In Proceedings IEEE conference on computer vision and pattern recognition (CVPR) (vol. 1, pp. 886–893).
Dollár, P. (2016). Piotr’s Image and Video Matlab Toolbox (PMT). http://vision.ucsd.edu/~pdollar/toolbox/doc/index.html. Retrieved, November 1, 2018.
Dollár, P., Appel, R., Belongie, S., & Perona, P. (2014). Fast feature pyramids for object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 36(8), 1532–1545.
Article Google Scholar
Dollár, P., Belongie, S., & Perona, P. (2010). The fastest pedestrian detector in the west. In Proceedings of the British machine vision conference (BMVC) (pp. 68.1–68.11).
Dollár, P., Tu, Z., Perona, P., & Belongie, S. (2009). Integral channel features. In Proceedings of the British machine vision conference (BMVC) (pp. 91.1–91.11).
Dollár, P., Wojek, C., Schiele, B., & Perona, P. (2012). Pedestrian detection: An evaluation of the state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 34(4), 743–761.
Article Google Scholar
Dubout, C., & Fleuret, F. (2012). Exact acceleration of linear object detectors. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 301–311).
Everingham, M., Gool, L. V., Williams, C. K., Winn, J., & Zisserman, A. (2010). The Pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2), 303–338.
Article Google Scholar
Everingham, M., Gool, L. V., Williams, C. K. I., Winn, J., & Zisserman, A. (2016). The Pascal visual object classes challenge 2007 (VOC2007) results. http://host.robots.ox.ac.uk/pascal/VOC/voc2007/index.html. Retrieved, November 1, 2018.
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 32(9), 1627–1645.
Article Google Scholar
Gall, J., & Lempitsky, V. (2009). Class-specific hough forests for object detection. In Proceedings IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1022–1029).
Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings conference on computer vision and pattern recognition (CVPR) (pp. 3354–3361).
Gepperth, A., Rebhan, S., Hasler, S., & Fritsch, J. (2011). Biased competition in visual processing hierarchies: A learning approach using multiple cues. Cognitive Computation, 3(1), 146–166.
Article Google Scholar
Huang, J., & Mumford, D. (1999). Statistics of natural images and models. In Proceedings IEEE conference on computer vision and pattern recognition (CVPR) (vol. 1, pp. 541–547).
Kuo, C. H., & Nevatia, R. (2009). Robust multi-view car detection using unsupervised sub-categorization. In Proceedings of the IEEE workshop on applications of computer vision (WACV) (pp. 1–8).
Lampert, C. H., Blaschko, M., & Hofmann, T. (2008). Beyond sliding windows: Object localization by efficient subwindow search. In Proceedings IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).
Leibe, B., Leonardis, A., & Schiele, B. (2008). Robust object detection with interleaved categorization and segmentation. International Journal of Computer Vision, 77(1), 259–289.
Article Google Scholar
Li, B., Wu, T., & Zhu, S. C. (2014). Integrating context and occlusion for car detection by hierarchical and-or model. In Proceedings of the European conference on computer vision (ECCV) (pp. 652–667).
Maji, S., Berg, A. C., & Malik, J. (2008). Classification using intersection kernel support vector machines is efficient. In Proceedings IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1–8).
Maji, S., Berg, A. C., & Malik, J. (2013). Efficient classification for additive kernel SVMs. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 35(1), 66–77.
Article Google Scholar
Mutch, J., & Lowe, D. G. (2008). Object class recognition and localization using sparse features with limited receptive fields. International Journal of Computer Vision, 80(1), 45–57.
Article Google Scholar
Naiel, M. A., Ahmad, M. O., & Swamy, M. (2015). Vehicle detection using approximation of feature pyramids in the DFT domain. In Proceedings of the international conference on image analysis and recognition. (ICIAR) (pp. 429–436). Springer.
Naiel, M. A., Ahmad, M. O., & Swamy, M. N. S. (2014). Vehicle detection using TD2DHOG features. In Proceedings of New circuits and systems conference (NewCAS) (pp. 389–392).
Ohn-Bar, E., & Trivedi, M. M. (2015). Learning to detect vehicles by clustering appearance patterns. IEEE Transactions on Intelligent Transportation Systems (ITS), 16(5), 2511–2521.
Article Google Scholar
Papageorgiou, C. P., Oren, M., & Poggio, T. (1998). A general framework for object detection. In Proceedings of the sixth IEEE international conference on computer vision (ICCV) (pp. 555–562).
Pepikj, B., Stark, M., Gehler, P., & Schiele, B. (2013). Occlusion patterns for object class detection. In Proceedings IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3286–3293).
Ruderman, D. L. (1994). The statistics of natural images. Network: Computation in Neural Systems, 5, 517–548.
Article MATH Google Scholar
Sivaraman, S., & Trivedi, M. (2010). A general active-learning framework for on-road vehicle recognition and tracking. The IEEE intelligent transportation systems (ITS), 11(2), 267–276.
Article Google Scholar
Sivaraman, S., & Trivedi, M. (2013a). Looking at vehicles on the road: A survey of vision-based vehicle detection, tracking, and behavior analysis. The IEEE intelligent transportation systems (ITS), 14(4), 1773–1795.
Article Google Scholar
Sivaraman, S., & Trivedi, M. (2013b). Vehicle detection by independent parts for urban driver assistance. The IEEE intelligent transportation systems (ITS), 14(4), 1597–1608.
Article Google Scholar
Smith, J. O. (2007). Mathematics of the discrete Fourier transform (DFT) (2nd ed.). W3K Publishing.
Takeuchi, A., Mita, S., & McAllester, D. (2010). On-road vehicle tracking using deformable object model and particle filter with integrated likelihoods. In Proceedings of the IEEE intelligent vehicles symposium (IV) (pp. 1014–1021).
Wang, C., Fang, Y., Zhao, H., Guo, C., Mita, S., & Zha, H. (2016). Probabilistic inference for occluded and multiview on-road vehicle detection. IEEE Transactions on Intelligent Transportation Systems (ITS) 17(1).
Wang, X., Han, T. X., & Yan, S. (2009). An HOG-LBP human detector with partial occlusion handling. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 32–39).
Wang, X., Yang, M., Zhu, S., & Lin, Y. (2015). Regionlets for generic object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 37(10), 2071–2084.
Article Google Scholar
Wu, B., & Nevatia, R. (2007). Cluster boosted tree classifier for multi-view, multi-pose object detection. In Proceedings of the IEEE international conference on computer vision (ICCV) (pp. 1–8).
Wu, B. F., Kao, C. C., Jen, C. L., Li, Y. F., Chen, Y. H., & Juang, J. H. (2014). A relative-discriminative-histogram-of-oriented-gradients-based particle filter approach to vehicle occlusion handling and tracking. IEEE Transactions on Industrial Electronics, 61, 4228–4237.
Article Google Scholar
Wu, J., Liu, N., Geyer, C., & Rehg, J. (2013). $\text{ C }^\text{4 }$: A real-time object detection framework. IEEE Transactions on Image Processing, 22(10), 4096–4107.
Article MathSciNet MATH Google Scholar
Xiang, Y., Choi, W., Lin, Y., & Savarese, S. (2015). Data-driven 3D voxel patterns for object category recognition. In Proceedings IEEE conference on computer vision and pattern recognition (CVPR) (pp. 1903–1911).
Yang, J., Zhang, D., Frangi, A. F., & Yang, J. Y. (2004). Two dimensional PCA: A new approach to appearance-based face representation and recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 26(1), 131–137.
Article Google Scholar

Download references

Acknowledgements

M. A. Naiel would like to acknowledge the support from Concordia University to conduct this research. This work is supported by research grants from the Natural Sciences and Engineering Research Council (NSERC) of Canada and the Regroupement Stratégeique en Microsystémes du Québec (ReSMiQ) awarded to M. O. Ahmad and M. N. S. Swamy.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Concordia University, Montreal, QC, H3G 1M8, Canada
Mohamed A. Naiel, M. Omair Ahmad & M. N. S. Swamy

Authors

Mohamed A. Naiel
View author publications
You can also search for this author in PubMed Google Scholar
M. Omair Ahmad
View author publications
You can also search for this author in PubMed Google Scholar
M. N. S. Swamy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Omair Ahmad.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Derivation of Equation (21)

The 2DDCT for a grayscale image in the spatial domain, $x\in \mathbb {R}^2$, is given by

$$\begin{aligned} X_{N,M}[u,v]= & {} \hat{\varGamma }_{N}[u]\hat{\varGamma }_{M}[v]\sum _{m=0}^{M-1}\sum _{n=0}^{N-1}x[n,m] \cos \left( \frac{\pi (2n+1)u}{2N}\right) \nonumber \\&\times \cos \left( \frac{\pi (2m+1)v}{2 M}\right) \end{aligned}$$

(1)

where $0\le u \le N-1$, $0\le v \le M-1$, $\hat{\varGamma }_{N}[k]=\sqrt{1/N}$ for $k=0$ and $\hat{\varGamma }_{N}[k]=\sqrt{2/N}$ for $0< k \le N-1$. Let N and M be even multiples of $K_1$, and $K_2$, respectively, where $K_1$ and $K_2$ are the downsampling factors in the y and the x directions, respectively. Let x[n, m] be a bandlimited sequence, and the sequence $y\in \mathbb {R}^2$, of size $(2N \times 2M)$, be defined as

$$\begin{aligned} \small {\begin{aligned} y[n,m]&={\left\{ \begin{array}{ll}x[n,m], &{} 0\le n \le N-1, 0\le m \le M-1\\ 0, &{} \text {otherwise} \\ \end{array}\right. } \end{aligned}} \end{aligned}$$

(2)

The N $\times $ M-point 2DDCT can be computed by 2N $\times $ 2M-point 2DDFT for a sequence, y[n, m], as follows. First, the 2DDFT is employed on y[n, m] in order to obtain $Y_{2N,2M}$. Similar to the 1DDCT case, the relation between the signal in the 2DDCT domain $X_{N,M}[u,v]$, and $Y_{2N,2M}[u,v]$ can be expressed as

$$\begin{aligned} X_{N,M}[u,v]=\hat{\varGamma }_{N}[u]\hat{\varGamma }_{M}[v] \text {Re} \left( Y_{2N,2M}[u,v] e^{-j\left( \frac{\pi u}{2N}+\frac{\pi v}{2M}\right) }\right) \end{aligned}$$

(3)

where $0\le u \le N-1$, $0\le v \le M-1$. Let $c_1$, $c_2$ denote the maximum frequencies retained by the truncation operator, where $c_1 < \hat{N}$, $c_2 <\hat{M}$, $\hat{N}=N/K_1$, and $\hat{M}= M/K_2$. Assume $Y_{2N,2M}$ is bandlimited to the maximum frequencies $(\hat{N}, \hat{M})$. Then, the downsampled signal in the 2DDCT domain, $\hat{X}_{\hat{N},\hat{M}}$, can be obtained as

$$\begin{aligned} \hat{X}_{\hat{N},\hat{M}}[u,v]&=\frac{1}{K_1 K_2} \hat{\varGamma }_{\hat{N}}[u]\hat{\varGamma }_{\hat{M}}[v] \text {Re}\left( Y_{2N,2M}[u,v] e^{-j\left( \frac{\pi u}{2N}+\frac{\pi v}{2M}\right) }\right) \end{aligned}$$

(4)

$$\begin{aligned}&= \frac{\hat{\varGamma }_{\hat{N}}[u]\hat{\varGamma }_{\hat{M}}[v]}{K_1 K_2\hat{\varGamma }_{{N}}[u]\hat{\varGamma }_{{M}}[v]} \hat{\varGamma }_{{N}}[u]\hat{\varGamma }_{{M}}[v] \text {Re}(Y_{2N,2M}[u,v]\nonumber \\&\quad \times e^{-j\left( \frac{\pi u}{2N}+\frac{\pi v}{2M}\right) }) \end{aligned}$$

(5)

$$\begin{aligned}&=\frac{\sqrt{1/\hat{N}}{\sqrt{1/\hat{M}}}}{K_1 K_2 \sqrt{1/N} \sqrt{1/M}} X_{N,M}[u,v] \end{aligned}$$

(6)

$$\begin{aligned}&= \frac{1}{\sqrt{K_1K_2}} X_{N,M}[u,v] \end{aligned}$$

(7)

where $0\le u \le c_1-1$ and $0\le v \le c_2-1$. Thus, the relation between the 2DDCT coefficients of the original image and that of the downsampled version is given by

$$\begin{aligned} X_{N,M}[u,v]= \sqrt{K_1K_2} \hat{X}_{\hat{N},\hat{M}}[u,v] \end{aligned}$$

(8)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Naiel, M.A., Ahmad, M.O. & Swamy, M.N.S. A vehicle detection scheme based on two-dimensional HOG features in the DFT and DCT domains. Multidim Syst Sign Process 30, 1697–1729 (2019). https://doi.org/10.1007/s11045-018-0621-1

Download citation

Received: 02 April 2018
Revised: 19 October 2018
Accepted: 21 October 2018
Published: 09 November 2018
Issue Date: October 2019
DOI: https://doi.org/10.1007/s11045-018-0621-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A vehicle detection scheme based on two-dimensional HOG features in the DFT and DCT domains

Abstract

Access this article

Similar content being viewed by others

Object Detection Based on Multiresolution CoHOG

Advanced Human Detection Using Fused Information of Depth and Intensity Images

Vehicle detection and recognition for intelligent traffic surveillance system

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Derivation of Equation (21)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A vehicle detection scheme based on two-dimensional HOG features in the DFT and DCT domains

Abstract

Access this article

Similar content being viewed by others

Object Detection Based on Multiresolution CoHOG

Advanced Human Detection Using Fused Information of Depth and Intensity Images

Vehicle detection and recognition for intelligent traffic surveillance system

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Derivation of Equation (21)

Appendix: Derivation of Equation (21)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation