## Abstract

The histogram of oriented gradients (HOG) is widely used for image description and proves to be very effective. In many vision problems, rotation-invariant analysis is necessary or preferred. Popular solutions are mainly based on pose normalization or learning, neglecting some intrinsic properties of rotations. This paper presents a method to build rotation-invariant HOG descriptors using Fourier analysis in polar/spherical coordinates, which are closely related to the irreducible representation of the 2D/3D rotation groups. This is achieved by considering a gradient histogram as a continuous angular signal which can be well represented by the Fourier basis (2D) or spherical harmonics (3D). As rotation-invariance is established in an analytical way, we can avoid discretization artifacts and create a continuous mapping from the image to the feature space. In the experiments, we first show that our method outperforms the state-of-the-art in a public dataset for a car detection task in aerial images. We further use the Princeton Shape Benchmark and the SHREC 2009 Generic Shape Benchmark to demonstrate the high performance of our method for similarity measures of 3D shapes. Finally, we show an application on microscopic volumetric data.

This is a preview of subscription content, access via your institution.

## Notes

- 1.
In this paper, a quantity that describes certain image content is generally called a

*feature*; a single gradient histogram computed in a local patch is referred to as a*HOG cell*; an assembled feature vector that describes a region of multiple cells is referred to as a*HOG descriptor*. - 2.
- 3.
In this paper, we do not rely on this

*polar tensor*concept, because we do not need any special mathematical tools for the related analysis of 2D images. - 4.
We purposely define the expansion coefficients with a conjugation, which makes it a standard inner product between the coefficients and SH basis. The same convention is used in Reisert and Burkhardt (2009). The advantage is that this linear expansion can be understood as a coupling between two spherical tensors, which will be explained later.

- 5.
This operator is written as \(\circ _\ell \) in Reisert and Burkhardt (2009), since \(\ell _1, \ell _2\) can be inferred from the two coupled tensors. In this paper we use the more explicit notation \({\otimes }_{(\ell |\ell _1,\ell _2)}\).

- 6.
- 7.
The coupling used here is only a portion of all possible combinations. We prefer these simple choices since we only want to demonstrate the description power of the proposed method. We believe that the optimal feature selection is application-dependent. Using a classifier like linear SVM or Random Forest, which have built-in feature selection ability, allows to increase the dimensionality of the feature vector by adding more coupled features.

- 8.
Patrick Min, https://www.google.com/search?q=binvox

- 9.
We created the ground-truth by editing a watershed segmentation result manually. Some very badly segmented regions were discarded and were not used for training.

## References

Ahonen, T., Matas, J., He, C., Pietikäinen, M. (2009).

*Rotation invariant image description with local binary pattern histogram Fourier features*. In Scandinavian Conference on Image, Analysis, pp. 61–70.Akgül, C., Axenopoulos, A., Bustos, B., Chaouch, M., Daras, P., Dutagaci, H., Furuya, T., Godil, A., Kreft, S., Lian, Z., et al. (2009).

*SHREC 2009-Generic Shape Retrieval contest*. In Eurographics workshop on 3D object retrieval.Allaire, S., Kim, J., Breen, S., Jaffray, D., & Pekar, V. (2008).

*Full orientation invariance and improved feature selectivity of 3D SIFT with application to medical image analysis*. In CVPR Workshops.Arsenault, H., & Sheng, Y. (1986). Properties of the circular harmonic expansion for rotation-invariant pattern recognition.

*Applied Optics*,*25*(18), 3225–3229.Bendale, P., Triggs, B., & Kingsbury, N. (2010).

*Multiscale keypoint analysis based on complex wavelets*. In British Machine Vision Conference, pp.*49*(1–49), 10.Bourdev, L., Malik, J. (2009).

*Poselets: Body part detectors trained using 3D human pose annotations*. In International Conference on Computer Vision, pp. 1365–1372.Breiman, L. (2001). Random forests.

*Machine Learning*,*45*(1), 5–32.Brink, D., & Satchler, G. (1968).

*Angular momentum*. Oxford: Clarendon Press.Bülow, T. (2004). Spherical diffusion for 3D surface smoothing.

*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*26*(12), 1650–1654.Burkhardt, H., & Siggelkow, S. (2001). Invariant features in pattern recognition—fundamentals and applications. In C. Kotropoulos & I. Pitas (Eds.),

*Nonlinear model-based image/video processing and analysis*(pp. 269–307). New York: Wiley.Chang, C.-C., Lin, C.-J. (2011). LIBSVM: A library for support vector machines.

*ACM Transactions on Intelligent Systems and Technology, 2*,27:1–27:27. Software available at http://www.csie.ntu.edu.tw/cjlin/libsvmCortes, C., & Vapnik, V. (1995). Support-vector networks.

*Machine Learning*,*20*(3), 273–297.Dalal, N., Triggs, B. (2005).

*Histograms of oriented gradients for human detection*. In IEEE Conference on Computer Vision and, Pattern Recognition, pp. 886–893.Driscoll, J., & Healy, D. (1994). Computing Fourier transforms and convolutions on the 2-sphere.

*Advances in Applied Mathematics*,*15*(2), 202–250.Fan, R., Chang, K., Hsieh, C., Wang, X., & Lin, C. (2008). LIBLINEAR: A library for large linear classification.

*The Journal of Machine Learning Research*,*9*, 1871–1874.Fehr, J. (2010).

*Local rotation invariant patch descriptors for 3D vector fields*. In International Conference on, Pattern Recognition, pp. 1381–1384.Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models.

*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*32*(9), 1627–1645.Flitton, G., Breckon, T., & Megherbi, N. (2010).

*Object recognition using 3D SIFT in complex CT volumes*. In British Machine Vision Conference, pp. 11(1–11), 12.Fornasier, M., & Toniolo, D. (2005). Fast, robust and efficient 2D pattern recognition for re-assembling fragmented images.

*Pattern Recognition*,*38*(11), 2074–2087.Förstner, W., Gülch, E. (1987).

*A fast operator for detection and precise location of distinct points, corners and centres of circular features*. In ISPRS intercommission conference on fast processing of photogrammetric data, pp. 281–305.Freeman, W., & Adelson, E. (1991). The design and use of steerable filters.

*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*13*(9), 891–906.Gauglitz, S. (2011).

*Improving keypoint orientation assignment*. In British Machine Vision Conference, pp.*93*(1–93), 11.Giannakis, G. (1989). Signal reconstruction from multiple correlations: frequency- and time-domain approaches.

*Journal of Optical Society of America A*,*6*(5), 682–697.Golub, G., & Van Loan, C. (1996).

*Matrix computations*. Baltimore: Johns Hopkins Univ Press.Green, R. (2003). Spherical harmonic lighting: The gritty details.

*In Game Developers Conference*,*2*, 2–3.Haasdonk, B., & Burkhardt, H. (2007). Invariant kernel functions for pattern analysis and machine learning.

*Machine Learning*,*68*(1), 35–61.Heitz, G., Koller, D. (2008).

*Learning spatial context: Using stuff to find things*. In European Conference on Computer Vision, pp. 30–43.Jacovitti, G., & Neri, A. (2000). Multiresolution circular harmonic decomposition.

*IEEE Transaction on Signal Processing*,*48*(11), 3242–3247.Kavukcuoglu, K., Ranzato, M., Fergus, R., Le-Cun, Y. (2009).

*Learning invariant features through topographic filter maps*. In IEEE Conference on Computer Vision and, Pattern Recognition, pp. 1605–1612.Kazhdan, M., Funkhouser, T., Rusinkiewicz, S. (2003).

*Rotation invariant spherical harmonic representation of 3D shape descriptors*. In Eurographics/ACM SIGGRAPH symposium on Geometry processing, pp. 156–164.Kläser, A., Marszałek, M., Schmid, C. (2008).

*A spatio-temporal descriptor based on 3D-gradients*. In British Machine Vision Conference, pp. 995–1004.Knopp, J., Prasad, M., Van Gool, L. (2010a).

*Orientation invariant 3D object classification using Hough transform based methods*. In ACM Multimedia, Workshop, pp. 15–20.Knopp, J., Prasad, M., Willems, G., Timofte, R., Van Gool, L. (2010b).

*Hough transform and 3D SURF for robust three dimensional classification*. In European Conference on Computer Vision, pp. 589–602.LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition.

*Proceedings of the IEEE*,*86*(11), 2278–2324.Lenz, R. (1990).

*Group theoretical methods in image processing*. Berlin: Springer.Lin, W., Liu, L., Matsushita, Y., Low, K., Liu, S. (2012).

*Aligning images in the wild*. In IEEE Conference on Computer Vision and, Pattern Recognition, pp. 1–8.Liu, K., Skibbe, H., Schmidt, T., Blein, T., Palme, K., & Ronneberger, O. (2011).

*3D rotation-invariant description from tensor operation on spherical HOG field*. In British Machine Vision Conference, pp.*33*(1-33), 12.Liu, K., Wang, Q., Driever, W., Ronneberger, O. (2012).

*2D/3D Rotation-invariant Detection using Equivariant Filters and Kernel Weighted Mapping*. In IEEE Conference on Computer Vision and, Pattern Recognition, pp. 917–924.Lowe, D. (2004). Distinctive image features from scale-invariant keypoints.

*International Journal of Computer Vision*,*60*(2), 91–110.Makadia, A., & Daniilidis, K. (2010). Spherical correlation of visual representations for 3D model retrieval.

*International Journal of Computer Vision*,*89*(2), 193–210.Memisevic, R., & Hinton, G. (2010). Learning to represent spatial transformations with factored higher-order boltzmann machines.

*Neural Computation*,*22*(6), 1473–1492.Özuysal, M., Calonder, M., Lepetit, V., & Fua, P. (2010). Fast keypoint recognition using random ferns.

*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*32*(3), 448–461.Ponce, C., & Singer, A. (2011). Computing steerable principal components of a large set of images and their rotations.

*IEEE Transactions on Image Processing*,*20*(11), 3051–3062.Reisert, M., & Burkhardt, H. (2008a).

*Efficient tensor voting with 3D tensorial harmonics*. In CVPR Workshops.Reisert, M., & Burkhardt, H. (2008b). Equivariant holomorphic filters for contour denoising and rapid object detection.

*IEEE Transactions on Image Processing*,*17*(2), 190–203.Reisert M., Burkhardt H. (2009) Spherical Tensor Calculus for Local Adaptive Filtering. In: Aja-Fernández S., de Luis García R., Tao D., Li X. (eds) Tensors in Image Processing and Computer Vision Advances in Pattern Recognition. Springer, USA, pp. 153–178.

Ronneberger, O., Burkhardt, H., & Schultz, E. (2002).

*General-purpose Object Recognition in 3D Volume Data Sets using Gray-Scale Invariants—Classification of Airborne Pollen-Grains Recorded with a Confocal Laser Scanning Microscope.*In International Conference on Pattern Recognition,*2*, 290–295.Ronneberger, O., Liu, K., Rath, M., Ruess, D., Mueller, T., Skibbe, H., et al. (2012). ViBE-Z: a framework for 3D virtual colocalization analysis in zebrafish larval brains.

*Nature Methods*,*9*(7), 735–742.Ronneberger, O., Wang, Q., & Burkhardt, H. (2007).

*3D Invariants with High Robustness to Local Deformations for Automated Pollen Recognition*(pp. 455–435). Pattern recognition: In DAGM conference on.Rose, M. (1957).

*Elementary theory of angular momentum*. New York: Wiley.Scherer, M., Walter, M., & Schreck, T. (2010).

*Histograms of Oriented Gradients for 3D Model Retrieval*(pp. 41–48). Visualization and Computer Vision: In International Conference in Central Europe on Computer Graphics.Schmidt, T., Keuper, M., Pasternak, T., Palme, K., & Ronneberger, O. (2012).

*Modeling of Sparsely Sampled Tubular Surfaces Using Coupled Curves*(pp. 83–92). Pattern recognition: In DAGM conference on.Schmidt, U., Roth, S. (2012). Learning rotation-aware features: From invariant priors to equivariant descriptors. In IEEE Conference on Computer Vision and, Pattern Recognition, pp. 2050–2057.

Schultz, T., Weickert, J., & Seidel, H. (2009). A higher-order structure tensor. In D. Laidlaw & J. Weickert (Eds.),

*Visualization and processing of tensor fields*(pp. 263–279). Berlin: Springer.Sheng, Y., & Arsenault, H. (1986). Experiments on pattern recognition using invariant Fourier-Mellin descriptors.

*Journal of Optical Society of America A*,*3*(6), 771–776.Shilane, P., Min, P., Kazhdan, M., Funkhouser, T. (2004). The Princeton Shape Benchmark. In International Conference on Shape Modeling and Applications, pp. 167–178.

Skibbe, H., & Reisert, M. (2012).

*Circular Fourier-HOG features for rotation invariant object detection in biomedical images*. In IEEE International Symposium on Biomedical Imaging, pp. 450–453.Skibbe, H., Reisert, M., & Burkhardt, H. (2011).

*SHOG-spherical HOG descriptors for rotation invariant 3D object detection*. In DAGM conference on Pattern recognition, pp. 142–151.Skibbe, H., Reisert, M., Ronneberger, O., & Burkhardt, H. (2009).

*Increasing the dimension of creativity in rotation invariant feature design using 3D tensorial harmonics*. In DAGM conference on Pattern recognition, pp. 141–150.Skibbe, H., Reisert, M., Schmidt, T., Brox, T., Ronneberger, O., Burkhardt, H. (2012). Fast rotation invariant 3D feature computation utilizing efficient local neighborhood operators. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(8):1563–1575. Software available at https://bitbucket.org/skibbe/sta-imagetoolbox

Takacs, G., Chandrasekhar, V., Tsai, S., Chen, D., Grzeszczuk, R., Girod, B. (2010).

*Unified real-time tracking and recognition with rotation-invariant fast features*. In IEEE Conference on Computer Vision and, Pattern Recognition, pp. 934–941.Vedaldi, A., Blaschko, M., Zisserman, A. (2011).

*Learning equivariant structured output SVM regressors*. In International Conference on Computer Vision, pp. 959–966.Villamizar, M., Moreno-Noguer, F., Andrade-Cetto, J., Sanfeliu, A. (2010).

*Efficient rotation invariant object detection using boosted random ferns*. In IEEE Conference on Computer Vision and, Pattern Recognition, pp. 1038–1045.Wang, Q., Ronneberger, O., & Burkhardt, H. (2009). Rotational invariance based on fourier analysis in polar and spherical coordinates.

*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*31*, 1715–1722.Wolberg, G., Zokai, S. (2000).

*Robust image registration using log-polar transform*. In IEEE International Conference on Image Processing, pp. 493–496.

## Acknowledgments

This study was supported by the Excellence Initiative of the German Federal and State Governments: BIOSS Centre for Biological Signalling Studies (EXC 294) and the Bundesministerium für Bildung und Forschung (German Federal Ministry of Education and Research) Project: New Methods in Systems Biology (SYSTEC, 0101-31P5914) – Quantitative 3D and 4D cell analysis in living organisms.

Henrik Skibbe is indebted to the Baden-Württemberg Stiftung for the financial support by the Elite Program for Post-docs. Dr. Thomas Blein was supported by a long-term post-doctoral fellowship from European Molecular Biology Organization (EMBO, ALTF250-2009). Dr. Thomas Blein and Prof. Klaus Palme are also supported by Deutsches Zentrum für Luft und Raumfahrt (DLR 50WB1022) and the European Union Framework 6 Program (AUTOSCREEN, LSHG-CT-2007-037897).

## Author information

### Affiliations

### Corresponding author

## Additional information

All authors are part of the BIOSS Centre for Biological Signalling Studies, University of Freiburg.

## Appendix

### Appendix

### Computation of the Tensorial Harmonic Expansion

Given a spherical tensor field \(\mathbf{F} \in \mathcal{T }^{\ell }\), we have a way to compute the tensorial harmonic expansion in Eq.(34), which is more efficient than the direct projections.

First we compute the scalar (SH) expansion on each individual tensor component \({F}_m:\mathbb{R }^3 \rightarrow \mathbb{C }\) as

then the tensorial expansion coefficients \(\mathbf{a}^{j,k}(r)\) can be computed from the above component-wise expansions by a derived relation as

where \(-(j+k) \le m^{\prime } \le j+k\). See Reisert and Burkhardt (2009) for proofs. We need to compute the ClebschGordan coefficients \(C\) in this circumstance. An easy way is to use their relation to the Wigner 3-j symbols \(\left( \begin{array}{lll} j_1&{}j_2&{}j_3\\ m_1&{}m_2&{}m_3 \end{array}\right) \) (Brink and Satchler 1968), which is written as

One can use the function “gsl_sf_coupling_3j” in the GNU Scientific Library to compute the Wigner 3-j symbol.

### Spherical Gaussian Derivatives

Let \(\mathbf{F} \in \mathcal{T }_\ell \), the spherical up-derivative \(\varvec{\nabla }^{1}_{}:\mathcal{T }_\ell \rightarrow \mathcal{T }_{\ell +1}\) and the down-derivative \(\varvec{\nabla }^{}_{1}:\mathcal{T }_\ell \rightarrow \mathcal{T }_{\ell -1}\) (Reisert and Burkhardt 2009) are defined as

where \(\nabla = (\frac{1}{\sqrt{2}}(\partial _x - \mathrm{i }{} \partial _y), \partial _z, -\frac{1}{\sqrt{2}}(\partial _x + \mathrm{i }{} \partial _y))\) is the spherical gradient operator with \(\partial _x,\partial _y,\partial _z\) being the standard partial derivatives. It is further defined that \(\varvec{\nabla }^{j_u}_{j_d}\mathbf{V} = \underbrace{\varvec{\nabla }^{}_{1}\ldots \varvec{\nabla }^{}_{1}}_{j_d \text{ times }}\underbrace{\varvec{\nabla }^{1}_{}\ldots \varvec{\nabla }^{1}_{}}_{j_u \text{ times }}\mathbf{V}\). One important property of this operation is that it maps a spherical tensor field to a higher or lower rank spherical tensor field. This is analogous to the fact that computing derivatives on a scalar field produces a gradient field, which is a rank-\(1\) tensor, and a subsequent derivative can either produce the Hessian (rank-\(2\) tensor) or the divergence (rank-\(0\) tensor).

For \( \mathbf{V} = \varvec{\nabla }^{1}_{} \mathbf{V}^{\prime } \nonumber \), where \(\mathbf{V}^{\prime }:\mathbb{R }^3 \rightarrow {\mathbb{C }}^{2(\ell -1)+1}, \mathbf{V}:\mathbb{R }^3 \rightarrow {\mathbb{C }}^{2\ell +1}\), by indexing the elements of \(\mathbf{V} \) and \(\mathbf{V}^{\prime } \) as \(\{V_{-\ell },\ldots ,V_{\ell }\}\) and \(\{V^{\prime }_{-\ell +1},\ldots ,V^{\prime }_{\ell -1}\}\), the computation rule of \(\varvec{\nabla }^{1}_{}\) is:

where \(w\) is the weighting coefficients which can be pre-computed from two *Clebsch-Gordan* coefficients as \(w{(\ell , m, a)} = \frac{C(\ell ,m|\ell -1 ,m-a, 1,a)}{C(\ell ,0|\ell -1, 0, 1, 0)}\). Thus the computation of the spherical tensor derivatives is just a group of weighted combinations of normal Cartesian derivatives.

Equation (47) also fits the spherical down-derivative \( \mathbf{V} = \varvec{\nabla }^{}_{1} \mathbf{V}^{\prime } \), where \(\mathbf{V}:\mathbb{R }^3 \rightarrow {\mathbb{C }}^{2\ell +1}\) and \(\mathbf{V}^{\prime }:\mathbb{R }^3 \rightarrow {\mathbb{C }}^{2(\ell +1)+1}\). The only difference are the coefficients: \(w{(\ell , m, a)} = \frac{C(\ell ,m|\ell +1,m-a, 1,a)}{C(\ell ,0|\ell +1,0, 1,0)}\).

A fast filtering tool is derived by computing the derivatives on an isotropic Gaussian function, which creates a series of basis function of different tensor ranks, as \(\varvec{\nabla }^{j_u}_{j_d} G \in \mathcal{T }_{j_u-j_d}\) (where \(j_u \ge j_d, G\) is a Gaussian function). The convolution with the spherical Gaussian derivatives can be computed efficiently like the standard Gaussian derivatives based on the commutativity of the convolution and differentiation. As an example, let \(\mathbf{F} \in \mathcal{T }_\ell \) be a spherical tensor field, we have

We can therefore compute multiple filtering outputs (for different \(\{j_u,j_d\}\)) by a single tensorial convolution plus differentiations. Note, the convolution like \(G \;\widetilde{\bullet }_{(\ell | 0, \ell )} \;\mathbf{F}\) is equivalent to normal Gaussian convolutions as \([G \;\widetilde{\bullet }_{(\ell | 0, \ell )} \mathbf{F}]_m = G * F_m\) (because \(C(\ell ,m|\ell ,m,0,0) = 1\)). The output is a tensor field of rank \(\ell + j_u - j_d\). In the context of this paper, we can take the SGD as derivatives after a scale-space selection by Gaussian convolution. The only important property for the rotation-invariance is that the introduced basis functions are spherical tensor fields.

## Rights and permissions

## About this article

### Cite this article

Liu, K., Skibbe, H., Schmidt, T. *et al.* Rotation-Invariant HOG Descriptors Using Fourier Analysis in Polar and Spherical Coordinates.
*Int J Comput Vis* **106, **342–364 (2014). https://doi.org/10.1007/s11263-013-0634-z

Received:

Accepted:

Published:

Issue Date:

### Keywords

- Rotation-invariance
- Image descriptor
- Fourier analysis
- Spherical harmonics
- Histogram of oriented gradients
- Feature design
- Volumetric data