## Abstract

Tuning properties of simple cells in cortical V1 can be described in terms of a “universal shape” characterized quantitatively by parameter values which hold across different species (Jones and Palmer 1987; Ringach 2002; Niell and Stryker 2008). This puzzling set of findings begs for a general explanation grounded on an evolutionarily important computational function of the visual cortex. We show here that these properties are quantitatively predicted by the hypothesis that the goal of the ventral stream is to compute for each image a “signature” vector which is invariant to geometric transformations (Anselmi et al. 2013b). The mechanism for continuously learning and maintaining invariance may be the memory storage of a sequence of neural images of a few (arbitrary) objects via Hebbian synapses, while undergoing transformations such as translation, scale changes and rotation. For V1 simple cells this hypothesis implies that the tuning of neurons converges to the eigenvectors of the covariance of their input. Starting with a set of dendritic fields spanning a range of sizes, we show with simulations suggested by a direct analysis, that the solution of the associated “cortical equation” effectively provides a set of Gabor-like shapes with parameter values that quantitatively agree with the physiology data. The same theory provides predictions about the tuning of cells in V4 and in the face patch AL (Leibo et al. 2013a) which are in qualitative agreement with physiology data.

### Keywords

- Visual Experience
- Independent Component Analysis
- Simple Cell
- Gabor Wavelet
- Deep Neural Network

*These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.*

This is a preview of subscription content, access via your institution.

## Buying options

## References

Abdel-Hamid O, Mohamed A, Jiang H, Penn G (2012) Applying convolutional neural networks concepts to hybrid nn-hmm model for speech recognition. In: 2012 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 4277–4280. IEEE

Anselmi F, Leibo JZ, Mutch J, Rosasco L, Tacchetti A, Poggio T (2013a) Part I: computation of invariant representations in visual cortex and in deep convolutional architectures. In preparation

Anselmi F, Leibo JZ, Rosasco L, Mutch J, Tacchetti A, Poggio T (2013b) Unsupervised learning of invariant representations in hierarchical architectures. Theoret Comput Sci. CBMM Memo n 1, in press. arXiv:1311.4158

Anselmi F, Poggio T (2010) Representation learning in sensory cortex: a theory. CBMM memo n 26

Bell A, Sejnowski T (1997) The independent components of natural scenes are edge filters. Vis Res 3327–3338

Boyd J (1984) Asymptotic coefficients of hermite function series. J Comput Phys 54:382–410

Croner L, Kaplan E (1995) Receptive fields of p and m ganglion cells across the primate retina. Vis Res 35(1):7–24

Dan Y, Atick JJ, Reid RC (1996) Effcient coding of natural scenes in the lateral geniculate nucleus: experimental test of a computational theory. J Neurosci 16:3351–3362

Földiák P (1991) Learning invariance from transformation sequences. Neural Comput 3(2):194–200

Freiwald W, Tsao D (2010) Functional compartmentalization and viewpoint generalization within the macaque face-processing system. Science 330(6005):845

Fukushima K (1980) Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern 36(4):193–202

Gallant J, Connor C, Rakshit S, Lewis J, Van Essen D (1996) Neural responses to polar, hyperbolic, and Cartesian gratings in area V4 of the macaque monkey. J Neurophysiol 76:2718–2739

Hebb DO (1949) The organization of behaviour: a neuropsychological theory. Wiley

Hyvrinen A, Oja E (1998) Independent component analysis by general non-linear hebbian-like learning rules. Signal Proces 64:301–313

Jones JP, Palmer LA (1987) An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. J Neurophysiol 58(6):1233–1258

Kay K, Naselaris T, Prenger R, Gallant J (2008) Identifying natural images from human brain activity. Nature 452(7185):352–355

Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Proces Syst 25

Le QV, Monga R, Devin M, Corrado G, Chen K, Ranzato M, Dean J, Ng AY (2011) Building high-level features using large scale unsupervised learning. CoRR. arXiv:1112.6209

LeCun Y, Boser B, Denker J, Henderson D, Howard R, Hubbard W, Jackel L (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551

LeCun Y, Bengio Y (1995) Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, pp 255–258

Leibo JZ, Anselmi F, Mutch J, Ebihara AF, Freiwald WA, Poggio T (2013a) View-invariance and mirror-symmetric tuning in a model of the macaque face-processing system. Comput Syst Neurosci I–54. Salt Lake City, USA

Leibo JZ, Anselmi F, Mutch J, Ebihara AF, Freiwald WA, Poggio T (2013b) View-invariance and mirror-symmetric tuning in a model of the macaque face-processing system. Comput Syst Neurosci (COSYNE)

Li N, DiCarlo JJ (2008) Unsupervised natural experience rapidly alters invariant object representation in visual cortex. Science 321(5895):1502–1507

Mallat S (2012) Group invariant scattering. Commun Pure Appl Math 65(10):1331–1398

Meister M, Wong R, Baylor DA, Shatz CJ et al (1991) Synchronous bursts of action potentials in ganglion cells of the developing mammalian retina. Science 252(5008):939–943

Mel BW (1997) SEEMORE: combining color, shape, and texture histogramming in a neurally inspired approach to visual object recognition. Neural Comput 9(4):777–804

Müller-Kirsten HJW (2012) Introduction to quantum mechanics: Schrödinger equation and path integral, 2nd edn. World Scientific, Singapore

Mutch J, Lowe D (2008) Object class recognition and localization using sparse features with limited receptive fields. Int J Comput Vis 80(1):45–57

Niell C, Stryker M (2008) Highly selective receptive fields in mouse visual cortex. J Neurosci 28(30):7520–7536

Oja E (1982) Simplified neuron model as a principal component analyzer. J Math Biol 15(3):267–273

Oja E (1992) Principal components, minor components, and linear neural networks. Neural Netw 5(6):927–935

Olshausen BA, Cadieu CF, Warland D (2009) Learning real and complex overcomplete representations from the statistics of natural images. In: Goyal VK, Papadakis M, van de Ville D (eds) SPIE Proceedings, vol. 7446: Wavelets XIII

Olshausen B et al (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381(6583):607–609

Perona P (1991) Deformable kernels for early vision. IEEE Trans Pattern Anal Mach Intell 17:488–499

Perrett D, Oram M (1993) Neurophysiology of shape processing. Image Vis Comput 11(6):317–333

Pinto N, DiCarlo JJ, Cox D (2009) How far can you get with a modern face recognition test set using only simple features? In: CVPR 2009. IEEE Conference on computer vision and pattern recognition, 2009. IEEE, pp 2591–2598

Poggio T, Edelman S (1990) A network that learns to recognize three-dimensional objects. Nature 343(6255):263–266

Poggio T, Mutch J, Anselmi F, Leibo JZ, Rosasco L, Tacchetti A (2011) Invariances determine the hierarchical architecture and the tuning properties of the ventral stream. Technical report available online, MIT CBCL, 2013. Previously released as MIT-CSAIL-TR-2012-035, 2012 and in Nature Precedings, 2011

Poggio T, Mutch J, Anselmi F, Leibo JZ, Rosasco L, Tacchetti A (2012) The computational magic of the ventral stream: sketch of a theory (and why some deep architectures work). Technical report MIT-CSAIL-TR-2012-035, MIT Computer Science and Artificial Intelligence Laboratory, 2012. Previously released in Nature Precedings, 2011

Poggio T, Mutch J, Isik L (2014) Computational role of eccentricity dependent cortical magnification. CBMM Memo No. 017. CBMM Funded. arXiv:1406.1770v1

Rehn M, Sommer FT (2007) A network that uses few active neurones to code visual input predicts the diverse shapes of cortical receptive fields. J Comput Neurosci 22(2):135–146

Riesenhuber M, Poggio T (1999) Hierarchical models of object recognition in cortex. Nature Neurosci. 2(11):1019–1025

Ringach D (2002) Spatial structure and symmetry of simple-cell receptive fields in macaque primary visual cortex. J Neurophysiol 88(1):455–463

Saxe AM, Bhand M, Mudur R, Suresh B, Ng AY (2011) Unsupervised learning models of primary cortical receptive fields and receptive field plasticity. In: Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger K (eds) Advances in neural information processing systems, vol 24, pp 1971–1979

Serre T, Wolf L, Bileschi S, Riesenhuber M, Poggio T (2007) Robust object recognition with cortex-like mechanisms. IEEE Trans Pattern Anal Mach Intell 29(3):411–426

Stevens CF (2004) Preserving properties of object shape by computations in primary visual cortex. PNAS 101(11):15524–15529

Stringer S, Rolls E (2002) Invariant object recognition in the visual system with novel views of 3D objects. Neural Comput 14(11):2585–2596

Torralba A, Oliva A (2003) Statistics of natural image categories. In: Network: computation in neural systems, pp 391–412

Turrigiano GG, Nelson SB (2004) Homeostatic plasticity in the developing nervous system. Nature Rev Neurosci 5(2):97–107

Wong R, Meister M, Shatz C (1993) Transient period of correlated bursting activity during development of the mammalian retina. Neuron 11(5):923–938

Zylberberg J, Murphy JT, DeWeese MR (2011) A sparse coding model with synaptically local plasticity and spiking neurons can account for the diverse shapes of v1 simple cell receptive fields. PLoS Comput Biol, 7(10):135–146

## Acknowledgments

This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF 1231216.

## Author information

### Authors and Affiliations

### Corresponding author

## Editor information

### Editors and Affiliations

## 1 Appendix

### 1 Appendix

### 1.1 1.1 Retinal Processing

Our simulation pipeline consists of several filtering stages steps that mimic retinal processing, followed by a Gaussian mask, as shown in Fig. 4. Values for the DoG filter were those suggested by Croner and Kaplan (1995); the spatial lowpass filter has frequency response: \(1/\sqrt{\omega ^{2}_{x}+\omega ^{2}_{y}}\). The temporal derivative is performed using imbalanced weights (−0.95,1) so that the DC components is not zero. Each cells learns by extracting the principal components of a movie generated by a natural image patch undergoing a rigid translation. Each frame goes through the pipeline described here and is then fed to the unsupervised learning module (computing eigenvectors of the covariance). We used 40 natural images and 19 different Gaussian apertures for the simulations presented in this book chapter (Fig. 5).

### 1.2 1.2 Additional Evidence for Gabor Shapes as Templates in V1

In addition to Jones and Palmer (1987), Niell and Stryker (2008), Ringach (2002), a recent paper (Kay et al. 2008) shows that the assumption of a system of Gabor wavelets in V1 provides a very good fitting of fMRI data. Note that the templates of the theory described in Anselmi et al. (2013a) become during unsupervised learning (because of Hebbian synapses) Gabor-like eigenfunctions, as described here.

### 1.3 1.3 Hebbian Rule and Gabor-Like Functions

In this section we show how, from the hypothesis that the synaptic weights of a simple cell change according to a Hebbian rule, the tuning properties of the simple cells in V1 converge to Gabor-like functions.

We consider, for simplicity, the 1*D* case (see also Poggio et al. 2013 for a derivation and properties). The associated eigenproblem is

where \(t^{\circledast }\) is the autocorrelation function of the template *t*, *g* is a gaussian function with fixed \(\sigma \), \(\nu _{n}\) are the eigenvalues and \(\psi _{n}\) are the eigenfunctions (see Poggio et al. 2012; Perona 1991 for solutions in the case where there is no gaussian).

#### 1.3.1 1.3.1 Approximate Anzatz Solution for Piecewise Constant Spectrum

We start representing the template autocorrelation function as the inverse of its Fourier transform:

Let \(\alpha =1/\sigma ^{2}_{x}\), \(\beta =1/\sigma ^{2}_{\psi }\) and assume that the eigenfunctions have the form \(\psi _{n}(x) = e^{-\frac{\beta }{2}x^2}e^{i\omega _{n}x}\), where \(\beta \) and \(\omega _{n}\) are parameters to be found. Assume also that \(g(x)=\exp (-(\alpha /2)x^{2})\). With these assumptions Eq. (2) reads:

Collecting the terms in *x* and integrating in *x* we have that the l.h.s becomes:

With the variable change \(\bar{\omega }= \omega -\omega _{n}\) and in the hypothesis that \(t^{\circledast }(\bar{\omega }+\omega _{n})\approx const\) over the significant support of the Gaussian centered in 0, integrating in \(\bar{\omega }\) we obtain:

Notice that this implies an upper bound on \(\beta \) since otherwise *t* would be white noise which is inconsistent with the diffraction-limited optics of the eye. Thus the condition in Eq. (6) holds approximately over the relevant *y* interval which is between \(- \sigma _{\psi }\) and \(+ \sigma _{\psi }\) and therefore Gabor functions are an approximate solution of Eq. (2).

We prove now that the orthogonality conditions of the eigenfunctions lead to Gabor wavelets. Consider, e.g., the approximate eigenfunction \(\psi _1\) with frequency \( \omega _0\). The minimum value of \(\omega _0\) is set by the condition that \(\psi _1\) has to be roughly orthogonal to the constant (this assumes that the visual input does have a DC component, which implies that there is no exact derivative stage in the input filtering by the retina).

where \(C_{(0,1)}\) is the multiplication of the normalizing factors of the eigenfunctions.

Using \(2\pi f_{0}=\frac{2 \pi }{\lambda _0} = \omega _{0}\) the condition above implies \(e^{-(\frac{\pi \sigma _{\psi }}{\lambda _{0}})^{2}} \approx 0\) which can be satisfied with \(\sigma _{\psi } \ge \lambda _0\); the condition \(\sigma _{\psi } \sim \lambda _0\) is enough since it implies \(e^{-(\frac{\pi \sigma _{\psi }}{\lambda _0})^2} \approx e^{-\pi ^2}\).

Imposing orthogonality of any pair of eigenfunctions:

we have a similar condition to the above. This implies that \(\lambda _{n}\) should increase with \(\sigma _{\psi }\) of the Gaussian aperture, *which is a property of gabor wavelets!*, even if this is valid here only for \(n=0,1,2\).

#### 1.3.2 1.3.2 Differential Equation Approach

In this section we describe another approach to the analysis of the cortical equation which is somewhat restricted but interesting for the potential connections with classical problems in mathematical physics.

Suppose as in the previous paragraph \(g(x)= e^{-\frac{\alpha }{2} x^2}\). The eigenproblem (2) can be written as:

or equivalently, multiplying both sides by \(e^{+\frac{\alpha }{2} y^2}\) and defining the function \(\xi _{n}(x)=e^{+\frac{\alpha }{2} x^2}\psi _{n}(x)\), as

Decomposing \(t^{\circledast }(x)\) as in Eq. (3) in Eq. (9):

Deriving twice in the *y* variable and rearranging the order of the integrals:

The expression above is equivalent to the original eigenproblem in Eq. (2) and will provide the same \(\psi \) modulo a first order polynomial in x (we will show the equivalence in the next paragraph where we specialize the template to natural images).

Indicating with \(\mathfrak {F}\) the Fourier transform we can rewrite (10) as:

Indicating with \(*\) the convolution operator by the convolution theorem

we have

Expanding \(\omega ^{2}t^{\circledast }(\omega )\) in Taylor series, \(\omega ^{2}t^{\circledast }(\omega )=\sum _{i}c_{i}\omega ^{i}\) and remembering that \(\mathfrak {F}^{-1}(\omega ^{m})=i^{m}\sqrt{2\pi }\delta ^{m}(x)\) we are finally lead to

The differential equation so obtained is difficult to solve for a generic power spectrum. In the next paragraph we study the case where we can obtain explicit solutions.

**Case**: \(1/\omega ^{2}\) **Power Spectrum**

In the case of average natural images power spectrum

the differential equation (11) assumes the particularly simple form

In the harmonic approximation, \(e^{-\alpha y^2} \approx 1-\alpha y^2\) (valid for \(\sqrt{\alpha }y\ll 1\)) we have

The equation above is of the form of a so called Weber differential equation:

The general solutions of Eq. (12) are:

where \(D(\eta ,y)\) are parabolic cylinder functions and \(C_{1},C_{2}\) are constants. It can be proved (Müller-Kirsten (2012), p. 139) that the solutions have two different behaviors, exponentially increasing or exponentially decreasing, and that we have exponentially decreasing real solutions if \(C_{2}=0\) and the following quantization condition holds:

Therefore, remembering that \(\alpha =1/\sigma ^{2}_{x}\), we obtain the spectrum quantization condition

Further, using the identity (true if \(n\in \mathbb {N}\)):

where \(H_{n}(y)\) are Hermite polynomials, we have :

i.e.

*Solutions plotted in Fig.* 6 *very well approximate Gabor functions*.

*Remark*: The solution in Eq. (15) is also an approximate solution for any template spectrum such that

This is important since it show how the solutions are robust to small changes of the power spectrum of the natural images.

**Aperture Ratio**

Using Eq. (15) the ratio between the width of the Gaussian aperture and that of the eigenfunctions can be calculated as

If we consider the first eigenfunction the ratio is \(\sqrt{2}\).

**Oscillating Behavior of Solutions**

Although there isn’t an explicit expression for the frequency of the oscillation part of \(\psi _{n}\) in Eq. (15) we can use the following approximation which calculates the frequency from the first crossing points of \(H_{n}(x)\), i.e. \(\pm \sqrt{2n+1}\), Boyd (1984). The oscillating part of (15) can be therefore written, in this approximation as

which gives \(\omega _{n} =(2n+1)/\sigma _{x}\). Using \(\omega _{n}=2\pi /\lambda _{n}\) and (16) we finally have

The above equation gives an important information: for any fixed eigenfunction, \(n=\bar{n}\) the number of oscillations under the gaussian envelope is constant.

**Equivalence of the Integral and Differential Equations**

To prove that the solutions of the eigenproblem (9) are equivalent to those of (13) we start with the observation that in the case of natural images power spectrum we can write explicitly (3):

The integral equation (9) can be written for for \(a>0,\;a\rightarrow \infty \)

where for simplicity we dropped the index *n* and \(c=\sqrt{\pi }/(\sqrt{2}\nu )\). Removing the modulus

Putting \(y=a,\;y=-a\) we can derive the boundary conditions

Substituting, using the differential equation, \(e^{-\alpha x^{2}}\xi (x)\) with \(-\xi ''(x)\nu /2\pi \) and integrating by parts we have

where \(c'=1/2\sqrt{2}\). The above two boundary conditions together with the differential equation are equivalent to the initial integral eigenproblem. If we want bounded solutions at infinity: \(\xi (\infty )=\xi (-\infty )=\xi '(\infty )=\xi '(-\infty )=0\).

### 1.4 1.4 Motion Determines a Consistent Orientation of the Gabor-Like Eigenfunctions

Consider a 2*D* image moving through time *t*, \(I(x(t),y(t))=I(\mathbf{{x}}(t))\) filtered, as in pipeline of Fig. 4, by a spatial low-pass filter and a band-pass filter and call the output \(f(\mathbf{{x}}(t))\).

Suppose now a temporal filter is done by a high-pass impulse response *h*(*t*). For example, let \(h(t) \sim \frac{d}{dt}\). We consider the effect of the time derivative over the translated signal, \(\mathbf{x}(t)=\mathbf{x}-\mathbf{v}t\) where \(\mathbf {v}\in \mathbb {R}^2\) is the velocity vector

If, for instance, the direction of motion is along the *x* axis with constant velocity, \(\mathbf {v}=(v_{x},0)\), then Eq. (18) become

or, in Fourier domain of spatial and temporal frequencies:

Consider now an image *I* with a symmetric spectrum \(1/(\sqrt{\omega ^{2}_{x}+\omega ^{2}_{y}})\). Equation (19) shows that the effect of the time derivative is to break the radial symmetry of the spectrum in the direction of motion (depending on the value of \(v_{x}\)). Intuitively, spatial frequencies in the *x* direction are enhanced. Thus motion effectively selects a specific orientation since it enhances the frequencies orthogonal to the direction of motion in Eq. (1).

Thus the theory suggests that motion effectively “selects” the direction of the Gabor-like function (see previous section) during the emergence and maintenance of a simple cell tuning. It turns out that in addition to orientation other features of the eigenvectors are shaped by motion during learning. This is shown by an equivalent simulation to that presented in Fig. 1 but in which the order of frames was scrambled before the time derivative stage. The receptive fields are still Gabor-like functions but lack the important property of having \(\sigma _x \propto \lambda \). This is summarized in Fig. 7.

The theory also predicts—assuming that the cortical equation provides a perfect description of Hebbian synapses—that the even eigenfunctions have slightly different \(n_x, n_y\) relations than odd eigenfunctions. It is unlikely that experiments data may allow to distinguish this small difference.

### 1.5 1.5 Phase of Gabor RFs

We do not analyze here the phase for a variety of reasons. The main reason is that phase measurements are rather variable in each species and across species. Phase is also difficult to measure. The general shape shows a peak in 0 and a higher peak at 90. These peaks are consistent with the \(n=2\) eigenfunctions (even) and the \(n=1\) eigenfunctions (odd) of Eq. 2 (the zero-th order eigenfunction is not included in the graph). The relative frequency of each peak would depend, according to our theory, on the dynamics (Oja equation) of learning and on the properties of the lateral inhibition between simple cells (to converge to eigenfunctions other than the first one). It is in any case interesting that the experimental data fit *qualitatively* our predictions: the \(\psi _1\) odd eigenfunction of the cortical equation should appear more often (because of its larger power) than the even \(\psi _2\) eigenfunction and no other ones with intermediate phases should exist—at least in the noise-less case (Fig. 8).

## Rights and permissions

## Copyright information

© 2017 Springer Science+Business Media Singapore

## About this chapter

### Cite this chapter

Mutch, J., Anselmi, F., Tacchetti, A., Rosasco, L., Leibo, J.Z., Poggio, T. (2017). Invariant Recognition Predicts Tuning of Neurons in Sensory Cortex. In: Zhao, Q. (eds) Computational and Cognitive Neuroscience of Vision. Cognitive Science and Technology. Springer, Singapore. https://doi.org/10.1007/978-981-10-0213-7_5

### Download citation

DOI: https://doi.org/10.1007/978-981-10-0213-7_5

Published:

Publisher Name: Springer, Singapore

Print ISBN: 978-981-10-0211-3

Online ISBN: 978-981-10-0213-7

eBook Packages: EngineeringEngineering (R0)