Approximative Coding Methods for Channel Representations
 582 Downloads
 1 Citations
Abstract
Most methods that address computer vision problems require powerful visual features. Many successful approaches apply techniques motivated from nonparametric statistics. The channel representation provides a framework for nonparametric distribution representation. Although early work has focused on a signal processing view of the representation, the channel representation can be interpreted in probabilistic terms, e.g., representing the distribution of local image orientation. In this paper, a variety of approximative channelbased algorithms for probabilistic problems are presented: a novel efficient algorithm for density reconstruction, a novel and efficient scheme for nonlinear gridding of densities, and finally a novel method for estimating Copula densities. The experimental results provide evidence that by relaxing the requirements for exact solutions, efficient algorithms are obtained.
Keywords
Visual features Channel representations Approximative density estimation Maximum entropy1 Introduction
Visual feature descriptors are essential to solve computer vision problems with stateoftheart methods. Although deep learning [18] eliminates the need to design feature descriptors by hand, approximative algorithms for probabilistic processing of feature layers are useful, e.g., for visualization [20, 31]. Furthermore, certain problems require more lightweight solutions and cannot make use of deep learning. Instead, combinations of designed feature descriptors with shallow networks or other machine learning approaches are more appropriate and produce good results, e.g., for realtime online learning of path following [21, 23]. A demonstration video of such a system is available online (https://goo.gl/JcvqHz). The system requires obtaining a full reconstruction of represented probability densities. Furthermore, the representation should be adapted to nonlinear domains, such as depth. In cases where there are dependencies between signals, statistical approaches are expected to improve if the dependency structure can be properly handled and separated from the marginal distributions.
Whereas DFs make an ordinary bin assignment and apply postsmoothing, channel representations apply a softassignment, i.e., presmoothing. This has shown to be more efficient [4]. Similarly, SIFT descriptors can be considered as a particular variant of channel representation of local orientation and the latter framework allows generalizing to color images [7]. HOG descriptors are a specific variant of channel coded feature maps CCFMs [16], but in contrast to the former no additional visualization [30] is required. CCFMs are based on frame theory, which comes with a decoding methodology that also covers visual reconstruction [3].

From the measured coefficients in the nonparametric density representation, a continuous density is to be estimated under the assumption of minimum information (maximum entropy) [15].

Whereas histogram bins are often equally distributed, i.e., the bin centers sample the input space regularly, highly varying densities require a nonlinear transformation of the input space before gridding. The resulting nonconstant measure is to be compensated during the nonregular gridding of the input space.

A joint density can be turned into a Copula distribution by transforming its marginals into uniform distributions. Similar to the second problem, the induced measure is to be taken care of during the calculation of the Copula distribution.
2 The Channel Representation
The channel representation has been proposed by Granlund [11]. It shares similarities to population codes [25, 29] and similar to their probabilistic interpretation [32] they approximate a kernel density estimator [5]. The mathematical proof has basically already been given in context of averaged shifted histograms [26]. A further related representation, orientation scores, is based on generalized wavelet theory [2].
Bilateral filtering allows to denoise a signal or an image without blurring edges because the different intensity/color levels on the two sides of the edge are represented in different parts of the model after the encoding. Thus, the two levels are not confused during spatial averaging. Instead, close to the edge a metamery region is formed, i.e., two different modes occur. The task during decoding is then to pick the stronger mode and to determine its maximum.
2.1 Encoding
2.2 Decoding/Reconstruction
Various ways to decode channel representations for different kernels have been suggested in the past [5, 10]. For the \(\cos ^2\)kernel, different degrees of overlap and confidence measures have been considered [10]. In this short review, we describe the recently suggested maximum likelihood decoding [8].
2.3 Maximum Entropy Reconstruction
3 Maximum Approximative Entropy Reconstruction
The approximation is limited to a linear expansion in (13) to simplify subsequent equations. Higher orders might lead to better accuracy, but at the cost of significantly more complicated solution than (14).
3.1 Direct Solution
3.2 Nonnegativity Constraint
Conjecture 1
3.3 Simulation Experiments
The reconstruction procedures are evaluated on samples drawn from known distributions. The \(N=10\) channel coefficients \(\mathbf {c}\) are set to their expected values, corresponding to infinitely many samples. From the channel coefficients, the maximum entropy and approximate entropy estimates of the distributions are calculated using the methods of Sects. 2.3 and 3, respectively.
The results are shown in Fig. 4, after six iterations and after convergence. The maximum entropy approach uses Newton iterations as suggested [15]. Note that each element of the Jacobian requires numerical evaluation of an integral. The maximum approximative entropy approach uses Newton iterations for fulfilling the nonnegativity constraints and gradient descent for minimizing (30). The Jacobian is obtained by matrix computations, with the number of elements related to the number of channels used. Using Matlab implementations and gridding the integrals at 100 points, each iteration of the maximum entropy approach requires 3–5 ms of computation time. For the maximum approximative entropy approach using Newton iterations, each iteration takes 0.5–0.6 ms.
For samples drawn from distributions with smooth density functions, the initial solution using the approximate entropy is close to the final solution. For density functions with discontinuities (upper row), the initial solution obtains negative values. However, less than six iterations are required to obtain a valid density function. The use of a weighted norm (30) has a small impact on the final result, generating a solution slightly closer to the true distribution function in the highdensity areas in Fig. 4, top right.
3.4 Regression Learning Experiments
The results from the simulation experiment above are confirmed by regression learning experiments. In these experiments, the head yaw angle for a set of people, taken from the MultiPIE dataset [13], has to be estimated. The experiment is described in detail in [14] and the channelbased regression method has been described in [21]. Channelbased regression clearly outperforms robust regression as introduced in [14], which is why we use the former as baseline below.
The experiment is providing a successively growing amount of training data to the regression method, which is evaluated on the respectively subsequent batch of data before using it for training. When comparing the performance of the new decoding method and the original method [21], we observe an increase in error after about 50 training samples, before both methods coincide after about 500 training samples. This intermediate decay of performance is presumably caused by secondary modes in the density function originating from the regression.
Beyond 1000 training samples, the baseline method with standard decoding [21] does not further improve, it even decays slightly, presumably caused by bias effects from the maximum operation on the channels. The proposed method, however, further improves performance until the end of the experiment and is likely to further improve if more data had been available. The final improvement of performance is larger than \(15\%\).
4 Nonregular Channel Placement
In most applications, the channel centers are distributed evenly in the space to be represented. In certain applications, however, other channel placements are beneficial. In this section, logarithmic and logpolar placements are presented along with some results and pointers to suitable applications.
4.1 Logarithmic Channels
Using logarithmic channels, the ability to resolve nearby encoded values varies over the domain. One typical application would be encoding events in time, where high resolution is required for recent events and low resolution suffices for older events. Referring to an event “about an hour ago,” the precision is some tens of minutes, while referring to an event “about 3 months ago,” the precision is some tens of days.
The major advantage of logarithmic transformations is that scaling of the encoded values will lead to a shift of the channel coefficients. In the example above, scaling values by a factor of two will lead to a shift of coefficient by three channels. Since humans often perceive entities in relative terms, see the example regarding temporal precision above or pitch spaces in music, the logarithmic mapping is biologically wellmotivated. Also in projective geometry, relative changes are of interest, e.g., in depth estimation.
4.2 LogPolar Channels
A polar coordinate system can be employed to extend the logarithmic channels to images. Logpolar coordinate systems have been applied to images before, e.g., for filter design in the Fourier domain [12] and similitude group invariant transforms, both globally [9] and locally [28].
In the logpolar channel arrangement, channels are regularly placed around concentric circles (representing orientation) with logarithmically increasing distance from the center. The setup stems from foveal vision, with higher resolution in the central parts; see Fig. 8.
A Cartesian image position \((x,\ y)\) is mapped to the logpolar grid \((r,\ \theta )\) by the complex logarithm \(r + i\theta = \mathrm {Log}(x + iy) \). The logarithmic radial position r and the angular position \(\theta \) are encoded in an outer product channel representation.
Channel coefficients are scaled with a factor \(1/(x^2 + y^2)\) to maintain a constant weight of all basis functions, compensating for the polar coordinate system and the logarithm of the radial position. Note that the supported radial range is limited at both ends, avoiding an infinite channel density at the origin.
4.3 Visual Tracking
One application for the logpolar channel layout is visual tracking. The operation of moving the central position of the logpolar grid followed by reencoding the image is approximated by a linear operation directly on the previous channel coefficients. Since the highresolution area will be at a different part of the image after translation, where only lower resolution information is available in the previous representation, only an approximation of the representation is obtained.
For rotations of the image in increments of the channel spacing in the angular direction, the corresponding new channel coefficients are obtained through a circular shift of the old coefficients. Combining the rotation operation with a single translation operation, translations in all directions can be generated through combined rotationshiftinverserotation operations.
With shift operations of different lengths, effects of operations in the 2D translation space can be sampled. By comparison with the representation of the next frame, translation information between the frames is obtained. This is illustrated in Fig. 10, where the errors after operations in the translation space are sampled in a logpolar grid and illustrated using logpolar channels. In this manner, more information regarding the local error surface is obtained.
5 Uniformization and Copula Estimation
Extending the idea of nonregular channel placement, channels should be placed depending on the data to be encoded, with high channel density where samples are likely. This can be obtained by mapping samples using the cumulative density function of the distribution from which the samples are drawn. Usually this function is not available, but using the ideas of density reconstruction from Sect. 3, a useful representation of the cumulative density function can be obtained and maintained online.
The mapped values will be close to uniformly distributed (using the true cumulative density functions, the mapped values will be uniformly distributed). Placing a new set of regularly spaced channels in this transformed space, their distribution in the original space will be sample density dependent.
For multidimensional distributions, this can be used to estimate the Copula which clearly indicates dependencies between dimensions by removing the effect of the marginal distributions. This is obtained by estimating marginal densities using the approach of Sect. 3, where the estimation of \(\mathbf {c}\) can be done incrementally. The reconstruction coefficients \(\lambda \) are updated by iterating (36) once after every new data point. The (density) Copula representation is obtained by encoding the mapped points using an outer product channel representation on the space \([0, 1] \times [0, 1]\). For independent random variables, the Copula is constant (one).
5.1 Experiments
Figure 12 illustrates the representation of one of the marginal distributions and the Copula estimation basis functions mapped through the inverse estimated marginal cumulative density function. Since the marginal distribution is smooth, the estimated densities follow the true densities closely. Figure 13 indicates the state of the estimate of the marginal distribution after 20 samples have been observed.
6 Conclusion
Channel representations are descriptors for visual features, motivated from nonparametric statistics. Powerful visual features are fundamental requirements for applying machine learning techniques to computer vision problems, e.g., for learning path following [23] and visual tracking [22].
This work extends previous work on channel representations that often only addressed orientation estimation or smoothing problems. We have presented a variety of approximative channelbased algorithms for probabilistic problems: a novel efficient algorithm for density reconstruction, a novel and efficient scheme for nonlinear gridding of densities, and finally a novel method for estimating Copula densities.
The proposed algorithms have been evaluated, and the experimental results provide evidence that by relaxing the requirements for exact solutions, efficient algorithms are obtained while retaining low approximation errors.
The incorporation of the proposed methods into existing learning systems, such as [21], and into new systems remains future work. With the novel algorithms at hand, possibly new problems can be approached or at least known problems can be approached in novel ways.
Footnotes
 1.
If \(\alpha _{n}\) is outside that range, \(r_{n}\) needs to be modified [8].
 2.
Note that the boundary conditions (27) are nontrivial: Due to the instability of the filters, numerical results might differ. However, we know from our assumptions that all \(\lambda _n=0\) for \(n<1\) or \(n>N\) and thus \(c_n=0\) for \(n>N\).
Notes
Acknowledgements
The authors express their gratitude to Ulrich Köthe for discussions on the topic and in particular for proposing the use of the Copula. This research was partly funded by the Swedish Research Council through a framework grant for the project Energy Minimization for Computational Cameras (20146227), by SSF grant RIT 150097 SymbiCloud, and by the excellence center ELLIIT.
References
 1.Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2005, vol. 1, pp. 886–893 (2005). https://doi.org/10.1109/CVPR.2005.177
 2.Duits, R., Franken, E.: Leftinvariant parabolic evolutions on SE(2) and contour enhancement via invertible orientation scores part II: nonlinear leftinvariant diffusions on invertible orientation scores. Q. Appl. Math. 68(2), 293–331 (2010). https://doi.org/10.1090/S0033569X10011733 MathSciNetCrossRefzbMATHGoogle Scholar
 3.Felsberg, M.: Incremental computation of feature hierarchies. In: Pattern Recognition, Lecture Notes in Computer Science, vol. 6376, pp. 523–532. Springer, Berlin (2010). https://doi.org/10.1007/9783642159862_53
 4.Felsberg, M.: Enhanced distribution field tracking using channel representations. In: IEEE ICCV Workshop On Visual Object Tracking Challenge (2013)Google Scholar
 5.Felsberg, M., Forssén, P.E., Scharr, H.: Channel smoothing: efficient robust smoothing of lowlevel signal features. IEEE Trans. Pattern Anal. Mach. Intell. 28(2), 209–222 (2006)CrossRefGoogle Scholar
 6.Felsberg, M., Granlund, G.: Anisotropic channel filtering. In: Proceedings of 13th Scandinavian Conference on Image Analysis, LNCS 2749, pp. 755–762 (2003)Google Scholar
 7.Felsberg, M., Hedborg, J.: Realtime viewbased pose recognition and interpolation for tracking initialization. J. Real Time Image Process. 2(2–3), 103–116 (2007)CrossRefGoogle Scholar
 8.Felsberg, M., Öfjäll, K., Lenz, R.: Unbiased decoding of biologically motivated visual feature descriptors. Front. Robot. AI 2, 20 (2015). https://doi.org/10.3389/frobt.2015.00020 CrossRefGoogle Scholar
 9.Ferraro, M., Caelli, T.M.: Lie transformation groups, integral transforms, and invariant pattern recognition. Spat. Vis. 8(4), 33–44 (1994)CrossRefGoogle Scholar
 10.Forssén, P.E.: Low and medium level vision using channel representations. Ph.D. Thesis, Linköping University, Sweden (2004)Google Scholar
 11.Granlund, G.H.: An associative perceptionaction structure using a localized space variant information representation. In: Proceedings of Algebraic Frames for the PerceptionAction Cycle (AFPAC), Germany (2000)Google Scholar
 12.Granlund, G.H., Knutsson, H.: Signal Processing for Computer Vision. Kluwer Academic Publishers, Dordrecht (1995)CrossRefGoogle Scholar
 13.Gross, R., Matthews, I., Cohn, J., Kanade, T., Baker, S.: Multipie. Image Vision Comput. 28(5), 807–813 (2010). https://doi.org/10.1016/j.imavis.2009.08.002. (Best of Automatic Face and Gesture Recognition 2008) CrossRefGoogle Scholar
 14.Huang, D., Cabral, R.S., De la Torre, F.: Robust regression. In: European Conference on Computer Vision (ECCV) (2012)Google Scholar
 15.Jonsson, E., Felsberg, M.: Reconstruction of probability density functions from channel representations. In: Proceedings of 14th Scandinavian Conference on Image Analysis (2005)Google Scholar
 16.Jonsson, E., Felsberg, M.: Efficient computation of channelcoded feature maps through piecewise polynomials. Image Vis. Comput. 27(11), 1688–1694 (2009). https://doi.org/10.1016/j.imavis.2008.11.002 CrossRefGoogle Scholar
 17.Kass, M., Solomon, J.: Smoothed local histogram filters. In: ACM SIGGRAPH 2010 Papers, SIGGRAPH ’10, pp. 100:1–100:10. ACM, New York, NY, USA (2010). https://doi.org/10.1145/1833349.1778837
 18.LeCun,Y., Bengio,Y.: Convolutional networks for images, speech, and time series. Handb. Brain Theory Neural Netw., pp 255–258 (1995)Google Scholar
 19.Lowe, D.G.: Distinctive image features from scaleinvariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)CrossRefGoogle Scholar
 20.Mahendran, A., Vedaldi, A.: Understanding deep image representations by inverting them. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
 21.Öfjäll, K., Felsberg, M.: Biologically inspired online learning of visual autonomous driving. In: Proceedings of the British Machine Vision Conference. BMVA Press (2014)Google Scholar
 22.Öfjäll, K., Felsberg, M.: Weighted update and comparison for channelbased distribution field tracking. In: ECCV 2014 Workshops, Lecture Notes in Computer Science, vol. 8926, pp. 218–231. Springer (2015). https://doi.org/10.1007/9783319161815_15
 23.Öfjäll, K., Felsberg, M., Robinson, A.: Visual autonomous road following by symbiotic online learning. In: 2016 IEEE Intelligent Vehicles Symposium Proceedings (2016)Google Scholar
 24.Paris, S., Durand, F.: A fast approximation of the bilateral filter using a signal processing approach. In: European Conference on Computer Vision (2006)Google Scholar
 25.Pouget, A., Dayan, P., Zemel, R.: Information processing with population codes. Nat. Rev. Neurosci. 1, 125–132 (2000)CrossRefGoogle Scholar
 26.Scott, D.W.: Averaged shifted histograms: effective nonparametric density estimators in several dimensions. Ann. Stat. 13(3), 1024–1040 (1985)MathSciNetCrossRefzbMATHGoogle Scholar
 27.SevillaLara, L., LearnedMiller, E.: Distribution fields for tracking. In: IEEE Computer Vision and Pattern Recognition (2012)Google Scholar
 28.Sharma, U., Duits, R.: Leftinvariant evolutions of wavelet transforms on the similitude group. Appl. Comput. Harmonic Anal. 39(1), 110–137 (2015). https://doi.org/10.1016/j.acha.2014.09.001 MathSciNetCrossRefzbMATHGoogle Scholar
 29.Snippe, H.P., Koenderink, J.J.: Discrimination thresholds for channelcoded systems. Biol. Cybern. 66, 543–551 (1992)CrossRefzbMATHGoogle Scholar
 30.Vondrick, C., Khosla, A., Malisiewicz, T., Torralba, A.: HOGgles: visualizing object detection features. In: ICCV (2013)Google Scholar
 31.Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: ECCV 2014, Lecture Notes in Computer Science, vol. 8689, pp. 818–833. Springer (2014). https://doi.org/10.1007/9783319105901_53
 32.Zemel, R.S., Dayan, P., Pouget, A.: Probabilistic interpretation of population codes. Neural Comput. 10(2), 403–430 (1998)CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.