Skip to main content

Invariant Recognition Predicts Tuning of Neurons in Sensory Cortex

  • Chapter
  • First Online:
Computational and Cognitive Neuroscience of Vision

Abstract

Tuning properties of simple cells in cortical V1 can be described in terms of a “universal shape” characterized quantitatively by parameter values which hold across different species (Jones and Palmer 1987; Ringach 2002; Niell and Stryker 2008). This puzzling set of findings begs for a general explanation grounded on an evolutionarily important computational function of the visual cortex. We show here that these properties are quantitatively predicted by the hypothesis that the goal of the ventral stream is to compute for each image a “signature” vector which is invariant to geometric transformations (Anselmi et al. 2013b). The mechanism for continuously learning and maintaining invariance may be the memory storage of a sequence of neural images of a few (arbitrary) objects via Hebbian synapses, while undergoing transformations such as translation, scale changes and rotation. For V1 simple cells this hypothesis implies that the tuning of neurons converges to the eigenvectors of the covariance of their input. Starting with a set of dendritic fields spanning a range of sizes, we show with simulations suggested by a direct analysis, that the solution of the associated “cortical equation” effectively provides a set of Gabor-like shapes with parameter values that quantitatively agree with the physiology data. The same theory provides predictions about the tuning of cells in V4 and in the face patch AL (Leibo et al. 2013a) which are in qualitative agreement with physiology data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Abdel-Hamid O, Mohamed A, Jiang H, Penn G (2012) Applying convolutional neural networks concepts to hybrid nn-hmm model for speech recognition. In: 2012 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp 4277–4280. IEEE

    Google Scholar 

  • Anselmi F, Leibo JZ, Mutch J, Rosasco L, Tacchetti A, Poggio T (2013a) Part I: computation of invariant representations in visual cortex and in deep convolutional architectures. In preparation

    Google Scholar 

  • Anselmi F, Leibo JZ, Rosasco L, Mutch J, Tacchetti A, Poggio T (2013b) Unsupervised learning of invariant representations in hierarchical architectures. Theoret Comput Sci. CBMM Memo n 1, in press. arXiv:1311.4158

  • Anselmi F, Poggio T (2010) Representation learning in sensory cortex: a theory. CBMM memo n 26

    Google Scholar 

  • Bell A, Sejnowski T (1997) The independent components of natural scenes are edge filters. Vis Res 3327–3338

    Google Scholar 

  • Boyd J (1984) Asymptotic coefficients of hermite function series. J Comput Phys 54:382–410

    Article  MathSciNet  MATH  Google Scholar 

  • Croner L, Kaplan E (1995) Receptive fields of p and m ganglion cells across the primate retina. Vis Res 35(1):7–24

    Article  Google Scholar 

  • Dan Y, Atick JJ, Reid RC (1996) Effcient coding of natural scenes in the lateral geniculate nucleus: experimental test of a computational theory. J Neurosci 16:3351–3362

    Google Scholar 

  • Földiák P (1991) Learning invariance from transformation sequences. Neural Comput 3(2):194–200

    Article  Google Scholar 

  • Freiwald W, Tsao D (2010) Functional compartmentalization and viewpoint generalization within the macaque face-processing system. Science 330(6005):845

    Article  Google Scholar 

  • Fukushima K (1980) Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern 36(4):193–202

    Article  MATH  Google Scholar 

  • Gallant J, Connor C, Rakshit S, Lewis J, Van Essen D (1996) Neural responses to polar, hyperbolic, and Cartesian gratings in area V4 of the macaque monkey. J Neurophysiol 76:2718–2739

    Google Scholar 

  • Hebb DO (1949) The organization of behaviour: a neuropsychological theory. Wiley

    Google Scholar 

  • Hyvrinen A, Oja E (1998) Independent component analysis by general non-linear hebbian-like learning rules. Signal Proces 64:301–313

    Article  MATH  Google Scholar 

  • Jones JP, Palmer LA (1987) An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. J Neurophysiol 58(6):1233–1258

    Google Scholar 

  • Kay K, Naselaris T, Prenger R, Gallant J (2008) Identifying natural images from human brain activity. Nature 452(7185):352–355

    Article  Google Scholar 

  • Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Proces Syst 25

    Google Scholar 

  • Le QV, Monga R, Devin M, Corrado G, Chen K, Ranzato M, Dean J, Ng AY (2011) Building high-level features using large scale unsupervised learning. CoRR. arXiv:1112.6209

  • LeCun Y, Boser B, Denker J, Henderson D, Howard R, Hubbard W, Jackel L (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1(4):541–551

    Article  Google Scholar 

  • LeCun Y, Bengio Y (1995) Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, pp 255–258

    Google Scholar 

  • Leibo JZ, Anselmi F, Mutch J, Ebihara AF, Freiwald WA, Poggio T (2013a) View-invariance and mirror-symmetric tuning in a model of the macaque face-processing system. Comput Syst Neurosci I–54. Salt Lake City, USA

    Google Scholar 

  • Leibo JZ, Anselmi F, Mutch J, Ebihara AF, Freiwald WA, Poggio T (2013b) View-invariance and mirror-symmetric tuning in a model of the macaque face-processing system. Comput Syst Neurosci (COSYNE)

    Google Scholar 

  • Li N, DiCarlo JJ (2008) Unsupervised natural experience rapidly alters invariant object representation in visual cortex. Science 321(5895):1502–1507

    Google Scholar 

  • Mallat S (2012) Group invariant scattering. Commun Pure Appl Math 65(10):1331–1398

    Article  MathSciNet  MATH  Google Scholar 

  • Meister M, Wong R, Baylor DA, Shatz CJ et al (1991) Synchronous bursts of action potentials in ganglion cells of the developing mammalian retina. Science 252(5008):939–943

    Article  Google Scholar 

  • Mel BW (1997) SEEMORE: combining color, shape, and texture histogramming in a neurally inspired approach to visual object recognition. Neural Comput 9(4):777–804

    Article  Google Scholar 

  • Müller-Kirsten HJW (2012) Introduction to quantum mechanics: Schrödinger equation and path integral, 2nd edn. World Scientific, Singapore

    Book  MATH  Google Scholar 

  • Mutch J, Lowe D (2008) Object class recognition and localization using sparse features with limited receptive fields. Int J Comput Vis 80(1):45–57

    Article  Google Scholar 

  • Niell C, Stryker M (2008) Highly selective receptive fields in mouse visual cortex. J Neurosci 28(30):7520–7536

    Article  Google Scholar 

  • Oja E (1982) Simplified neuron model as a principal component analyzer. J Math Biol 15(3):267–273

    Article  MathSciNet  MATH  Google Scholar 

  • Oja E (1992) Principal components, minor components, and linear neural networks. Neural Netw 5(6):927–935

    Article  Google Scholar 

  • Olshausen BA, Cadieu CF, Warland D (2009) Learning real and complex overcomplete representations from the statistics of natural images. In: Goyal VK, Papadakis M, van de Ville D (eds) SPIE Proceedings, vol. 7446: Wavelets XIII

    Google Scholar 

  • Olshausen B et al (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381(6583):607–609

    Article  Google Scholar 

  • Perona P (1991) Deformable kernels for early vision. IEEE Trans Pattern Anal Mach Intell 17:488–499

    Article  Google Scholar 

  • Perrett D, Oram M (1993) Neurophysiology of shape processing. Image Vis Comput 11(6):317–333

    Article  Google Scholar 

  • Pinto N, DiCarlo JJ, Cox D (2009) How far can you get with a modern face recognition test set using only simple features? In: CVPR 2009. IEEE Conference on computer vision and pattern recognition, 2009. IEEE, pp 2591–2598

    Google Scholar 

  • Poggio T, Edelman S (1990) A network that learns to recognize three-dimensional objects. Nature 343(6255):263–266

    Article  Google Scholar 

  • Poggio T, Mutch J, Anselmi F, Leibo JZ, Rosasco L, Tacchetti A (2011) Invariances determine the hierarchical architecture and the tuning properties of the ventral stream. Technical report available online, MIT CBCL, 2013. Previously released as MIT-CSAIL-TR-2012-035, 2012 and in Nature Precedings, 2011

    Google Scholar 

  • Poggio T, Mutch J, Anselmi F, Leibo JZ, Rosasco L, Tacchetti A (2012) The computational magic of the ventral stream: sketch of a theory (and why some deep architectures work). Technical report MIT-CSAIL-TR-2012-035, MIT Computer Science and Artificial Intelligence Laboratory, 2012. Previously released in Nature Precedings, 2011

    Google Scholar 

  • Poggio T, Mutch J, Isik L (2014) Computational role of eccentricity dependent cortical magnification. CBMM Memo No. 017. CBMM Funded. arXiv:1406.1770v1

  • Rehn M, Sommer FT (2007) A network that uses few active neurones to code visual input predicts the diverse shapes of cortical receptive fields. J Comput Neurosci 22(2):135–146

    Google Scholar 

  • Riesenhuber M, Poggio T (1999) Hierarchical models of object recognition in cortex. Nature Neurosci. 2(11):1019–1025

    Article  Google Scholar 

  • Ringach D (2002) Spatial structure and symmetry of simple-cell receptive fields in macaque primary visual cortex. J Neurophysiol 88(1):455–463

    Google Scholar 

  • Saxe AM, Bhand M, Mudur R, Suresh B, Ng AY (2011) Unsupervised learning models of primary cortical receptive fields and receptive field plasticity. In: Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger K (eds) Advances in neural information processing systems, vol 24, pp 1971–1979

    Google Scholar 

  • Serre T, Wolf L, Bileschi S, Riesenhuber M, Poggio T (2007) Robust object recognition with cortex-like mechanisms. IEEE Trans Pattern Anal Mach Intell 29(3):411–426

    Article  Google Scholar 

  • Stevens CF (2004) Preserving properties of object shape by computations in primary visual cortex. PNAS 101(11):15524–15529

    Article  Google Scholar 

  • Stringer S, Rolls E (2002) Invariant object recognition in the visual system with novel views of 3D objects. Neural Comput 14(11):2585–2596

    Article  MATH  Google Scholar 

  • Torralba A, Oliva A (2003) Statistics of natural image categories. In: Network: computation in neural systems, pp 391–412

    Google Scholar 

  • Turrigiano GG, Nelson SB (2004) Homeostatic plasticity in the developing nervous system. Nature Rev Neurosci 5(2):97–107

    Article  Google Scholar 

  • Wong R, Meister M, Shatz C (1993) Transient period of correlated bursting activity during development of the mammalian retina. Neuron 11(5):923–938

    Article  Google Scholar 

  • Zylberberg J, Murphy JT, DeWeese MR (2011) A sparse coding model with synaptically local plasticity and spiking neurons can account for the diverse shapes of v1 simple cell receptive fields. PLoS Comput Biol, 7(10):135–146

    Google Scholar 

Download references

Acknowledgments

This work was supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF 1231216.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabio Anselmi .

Editor information

Editors and Affiliations

1 Appendix

1 Appendix

1.1 1.1 Retinal Processing

Our simulation pipeline consists of several filtering stages steps that mimic retinal processing, followed by a Gaussian mask, as shown in Fig. 4. Values for the DoG filter were those suggested by Croner and Kaplan (1995); the spatial lowpass filter has frequency response: \(1/\sqrt{\omega ^{2}_{x}+\omega ^{2}_{y}}\). The temporal derivative is performed using imbalanced weights (−0.95,1) so that the DC components is not zero. Each cells learns by extracting the principal components of a movie generated by a natural image patch undergoing a rigid translation. Each frame goes through the pipeline described here and is then fed to the unsupervised learning module (computing eigenvectors of the covariance). We used 40 natural images and 19 different Gaussian apertures for the simulations presented in this book chapter (Fig. 5).

Fig. 4
figure 4

Retinal processing pipeline used for V1 simulations. Though Gabor-like filters are obtained irrespectively of the presence or absence of any element of the pipeline the DoG filter is important in 1D and 2D for the emergence of actual Gabor wavelets with the correct dependence of \(\lambda \) on \(\sigma \); the spatial low-pass filter together with the temporal derivative are necessary in our simulation to constrain \(\lambda \) to be proportional to \(\sigma \)

Fig. 5
figure 5

Summary plots for 2D simulations of V1 cells trained according to the pipeline described in Fig. 1. Figures from top left to bottom right: sinusoid wavelength (\(\lambda \)) versus Gaussian aperture width (\(\sigma _\alpha \)); sinusoid wavelength (\(\lambda \)) versus Gaussian envelope width on the modulated direction (\(\sigma \)); Gaussian envelope width for the modulated direction (\(\sigma \)) versus Gaussian aperture width (\(\sigma _\alpha \)); ratio between sinusoid wavelength and Gaussian envelope width for the modulated direction (\(n_x\)) vs. Gaussian aperture width (\(\sigma _\alpha \)); ratio between sinusoid wavelength and Gaussian envelope width on the unmodulated direction (\(n_y\)) versus ratio between sinusoid wavelength and Gaussian envelope width for the modulated direction (\(n_x\)). The pipeline consists of a Gaussian blur, a DOG filter, a spatial low-pass filter \(1/\sqrt{\omega _x^2 + \omega _y^2}\) and an imperfect temporal derivative. Parameters for all filters were set to values measured in macaque monkeys by neurophysiologists

1.2 1.2 Additional Evidence for Gabor Shapes as Templates in V1

In addition to Jones and Palmer (1987), Niell and Stryker (2008), Ringach (2002), a recent paper (Kay et al. 2008) shows that the assumption of a system of Gabor wavelets in V1 provides a very good fitting of fMRI data. Note that the templates of the theory described in Anselmi et al. (2013a) become during unsupervised learning (because of Hebbian synapses) Gabor-like eigenfunctions, as described here.

1.3 1.3 Hebbian Rule and Gabor-Like Functions

In this section we show how, from the hypothesis that the synaptic weights of a simple cell change according to a Hebbian rule, the tuning properties of the simple cells in V1 converge to Gabor-like functions.

We consider, for simplicity, the 1D case (see also Poggio et al. 2013 for a derivation and properties). The associated eigenproblem is

$$\begin{aligned} \int dx g(y) g(x) \psi _n(x) t^{\circledast }(y-x) = \nu _n \psi _n(y) \end{aligned}$$
(2)

where \(t^{\circledast }\) is the autocorrelation function of the template t, g is a gaussian function with fixed \(\sigma \), \(\nu _{n}\) are the eigenvalues and \(\psi _{n}\) are the eigenfunctions (see Poggio et al. 2012; Perona 1991 for solutions in the case where there is no gaussian).

1.3.1 1.3.1 Approximate Anzatz Solution for Piecewise Constant Spectrum

We start representing the template autocorrelation function as the inverse of its Fourier transform:

$$\begin{aligned} t^{\circledast }(x)=\frac{1}{\sqrt{2\pi }}\int d\omega \;t^{\circledast }(\omega )e^{i\omega x}. \end{aligned}$$
(3)

Let \(\alpha =1/\sigma ^{2}_{x}\), \(\beta =1/\sigma ^{2}_{\psi }\) and assume that the eigenfunctions have the form \(\psi _{n}(x) = e^{-\frac{\beta }{2}x^2}e^{i\omega _{n}x}\), where \(\beta \) and \(\omega _{n}\) are parameters to be found. Assume also that \(g(x)=\exp (-(\alpha /2)x^{2})\). With these assumptions Eq. (2) reads:

$$\begin{aligned} \frac{1}{\sqrt{2\pi }}e^{-\frac{\alpha }{2}y^2}\int dx\; e^{-\frac{x^2 (\alpha +\beta )}{2}}\int d\omega \;t^{\circledast }(\omega )e^{i\omega (y-x)}e^{iw_{n}x}= \nu (\omega _{n})e^{-\frac{\beta y^2}{2}}e^{i\omega _{n} y}. \end{aligned}$$
(4)

Collecting the terms in x and integrating in x we have that the l.h.s becomes:

$$\begin{aligned} \sqrt{\frac{1}{\alpha +\beta }}e^{-\frac{\alpha }{2}y^2}\int d\omega \;t^{\circledast }(\omega )e^{i\omega y}e^{-\frac{(\omega -\omega _{n})^{2}}{2(\alpha +\beta )}}. \end{aligned}$$
(5)

With the variable change \(\bar{\omega }= \omega -\omega _{n}\) and in the hypothesis that \(t^{\circledast }(\bar{\omega }+\omega _{n})\approx const\) over the significant support of the Gaussian centered in 0, integrating in \(\bar{\omega }\) we obtain:

$$\begin{aligned} \sqrt{2\pi } e^{-\frac{y^2 \alpha }{2}} e^{i \omega _{n} y} e^{- \frac{y^2 (\alpha +\beta )}{2}} \sim \nu (\omega _{n}) e^{-\frac{y^2 \beta }{2}} e^{i\omega _{n} y}. \end{aligned}$$
(6)

Notice that this implies an upper bound on \(\beta \) since otherwise t would be white noise which is inconsistent with the diffraction-limited optics of the eye. Thus the condition in Eq. (6) holds approximately over the relevant y interval which is between \(- \sigma _{\psi }\) and \(+ \sigma _{\psi }\) and therefore Gabor functions are an approximate solution of Eq. (2).

We prove now that the orthogonality conditions of the eigenfunctions lead to Gabor wavelets. Consider, e.g., the approximate eigenfunction \(\psi _1\) with frequency \( \omega _0\). The minimum value of \(\omega _0\) is set by the condition that \(\psi _1\) has to be roughly orthogonal to the constant (this assumes that the visual input does have a DC component, which implies that there is no exact derivative stage in the input filtering by the retina).

$$\begin{aligned} \langle \psi _{0},\psi _{1}\rangle =C_{(0,1)}\;\int dx\; e^{-\beta x^2}e^{-i\omega _{0}x}=0 \; \Rightarrow \;e^{-\frac{\omega _{0}^{2}}{4\beta }} \approx 0 \end{aligned}$$
(7)

where \(C_{(0,1)}\) is the multiplication of the normalizing factors of the eigenfunctions.

Using \(2\pi f_{0}=\frac{2 \pi }{\lambda _0} = \omega _{0}\) the condition above implies \(e^{-(\frac{\pi \sigma _{\psi }}{\lambda _{0}})^{2}} \approx 0\) which can be satisfied with \(\sigma _{\psi } \ge \lambda _0\); the condition \(\sigma _{\psi } \sim \lambda _0\) is enough since it implies \(e^{-(\frac{\pi \sigma _{\psi }}{\lambda _0})^2} \approx e^{-\pi ^2}\).

Imposing orthogonality of any pair of eigenfunctions:

$$\begin{aligned} \int dx\; \psi ^{*}_n(x) \psi _m(x) = const(m,n)\int dx e^{-\beta x^2} e^{ i n \omega _0 x} e^{- i m \omega _0 x} \propto e^{- \frac{((m-n) \omega _0)^2 \sigma ^{2}_{\psi }}{4}}, \end{aligned}$$

we have a similar condition to the above. This implies that \(\lambda _{n}\) should increase with \(\sigma _{\psi }\) of the Gaussian aperture, which is a property of gabor wavelets!, even if this is valid here only for \(n=0,1,2\).

1.3.2 1.3.2 Differential Equation Approach

In this section we describe another approach to the analysis of the cortical equation which is somewhat restricted but interesting for the potential connections with classical problems in mathematical physics.

Suppose as in the previous paragraph \(g(x)= e^{-\frac{\alpha }{2} x^2}\). The eigenproblem (2) can be written as:

$$\begin{aligned} \nu _{n}\psi _{n}(y)-e^{-\frac{\alpha }{2} y^2}\int \;dx\;e^{-\frac{\alpha }{2}x^2}t^{\circledast }(y-x)\psi _{n}(x) = 0, \end{aligned}$$
(8)

or equivalently, multiplying both sides by \(e^{+\frac{\alpha }{2} y^2}\) and defining the function \(\xi _{n}(x)=e^{+\frac{\alpha }{2} x^2}\psi _{n}(x)\), as

$$\begin{aligned} \nu _{n}\xi _{n}(y)-\int \;dx\;e^{-\alpha x^{2}}t^{\circledast }(y-x)\xi _{n}(x) = 0. \end{aligned}$$
(9)

Decomposing \(t^{\circledast }(x)\) as in Eq. (3) in Eq. (9):

$$ \nu _{n}\xi _{n}(y)-\frac{1}{\sqrt{2\pi }}\int \;dx\;e^{-\alpha x^{2}}\int \;d\omega \;t^{\circledast }(\omega )e^{i\omega (y-x)}\xi _{n}(x) = 0. $$

Deriving twice in the y variable and rearranging the order of the integrals:

$$\begin{aligned} \nu _{n}\xi ^{''}_{n}(y)+\frac{1}{\sqrt{2\pi }}\int \;d\omega \;\omega ^{2}t^{\circledast }(\omega )e^{i\omega y}\int \;dx\;e^{-\alpha x^2}\psi _{n}(x)e^{-i\omega x}=0. \end{aligned}$$
(10)

The expression above is equivalent to the original eigenproblem in Eq. (2) and will provide the same \(\psi \) modulo a first order polynomial in x (we will show the equivalence in the next paragraph where we specialize the template to natural images).

Indicating with \(\mathfrak {F}\) the Fourier transform we can rewrite (10) as:

$$ \nu _{n}\xi ^{''}_{n}(y)+\sqrt{2\pi }\mathfrak {F}^{-1}\Big (\omega ^{2}t^{\circledast }(\omega )\mathfrak {F}\big (e^{-\alpha x^2}\psi _{n}(x)\big )\Big )=0. $$

Indicating with \(*\) the convolution operator by the convolution theorem

$$f*g = \mathfrak {F}^{-1}(\mathfrak {F}(f)\mathfrak {F}(g)),\;\;\forall f,g\in L^{2}(\mathbb {R})$$

we have

$$ \nu _{n}\xi ^{''}_{n}(y)+\sqrt{2\pi }\mathfrak {F}^{-1}\big (\omega ^{2}t^{\circledast }(\omega )\big )*\big (e^{-\alpha x^2}\xi _{n}(x)\big )=0. $$

Expanding \(\omega ^{2}t^{\circledast }(\omega )\) in Taylor series, \(\omega ^{2}t^{\circledast }(\omega )=\sum _{i}c_{i}\omega ^{i}\) and remembering that \(\mathfrak {F}^{-1}(\omega ^{m})=i^{m}\sqrt{2\pi }\delta ^{m}(x)\) we are finally lead to

$$\begin{aligned} \nu _{n}\xi ^{''}_{n}(y)+2\pi (c_{0}\delta + ic_{1}\delta ^{'}+\ldots )*\big (e^{-\alpha x^2}\xi _{n}(x)\big )=0. \end{aligned}$$
(11)

The differential equation so obtained is difficult to solve for a generic power spectrum. In the next paragraph we study the case where we can obtain explicit solutions.

Case: \(1/\omega ^{2}\) Power Spectrum

In the case of average natural images power spectrum

$$ t^{\circledast }(\omega )=\frac{1}{\omega ^{2}} $$

the differential equation (11) assumes the particularly simple form

$$\begin{aligned} \nu _{n}\xi ^{''}_{n}(y)+2\pi e^{-\alpha y^2}\xi _{n}(y)=0. \end{aligned}$$
(12)

In the harmonic approximation, \(e^{-\alpha y^2} \approx 1-\alpha y^2\) (valid for \(\sqrt{\alpha }y\ll 1\)) we have

$$\begin{aligned} \xi ^{''}_{n}(y)+\frac{2\pi }{\nu _{n}}(1-\alpha y^2\big )\xi _{n}(y)=0. \end{aligned}$$
(13)

The equation above is of the form of a so called Weber differential equation:

$$ \xi ^{''}(y)+(ay^{2}+by+c)\xi (y)=0,\;\;a,b,c\in \mathbb {R}$$

The general solutions of Eq. (12) are:

$$ \xi _{n}(y)= C_{1} D \Big (-\frac{1}{2}+\frac{\pi ^{\frac{1}{2}}}{\sqrt{2\alpha \nu _{n}}},\frac{2^{\frac{3}{4}}\alpha ^{\frac{1}{4}}\pi ^{\frac{1}{4}}}{{\nu _{n}}^{\frac{1}{4}}}\Big ) + C_{2} D \Big (-\frac{1}{2}-\frac{\pi ^{\frac{1}{2}}}{\sqrt{2\alpha \nu _{n}}},i\frac{2^{\frac{3}{4}}\alpha ^{\frac{1}{4}}\pi ^{\frac{1}{4}}}{{\nu _{n}}^{\frac{1}{4}}}\Big ) $$

where \(D(\eta ,y)\) are parabolic cylinder functions and \(C_{1},C_{2}\) are constants. It can be proved (Müller-Kirsten (2012), p. 139) that the solutions have two different behaviors, exponentially increasing or exponentially decreasing, and that we have exponentially decreasing real solutions if \(C_{2}=0\) and the following quantization condition holds:

$$ -\frac{1}{2}+\frac{\pi ^{\frac{1}{2}}}{\sqrt{2\alpha \nu _{n}}}=n,\;n=0,1,\ldots $$

Therefore, remembering that \(\alpha =1/\sigma ^{2}_{x}\), we obtain the spectrum quantization condition

$$\begin{aligned} \nu _{n}=2\pi \frac{\sigma ^{2}_{x}}{(2n+1)^{2}},\;n=0,1,\ldots \end{aligned}$$
(14)

Further, using the identity (true if \(n\in \mathbb {N}\)):

$$ D(n,y)=2^{-\frac{n}{2}}e^{-\frac{y^{2}}{4}}H_{n}\big (\frac{y}{\sqrt{2}}\big ) $$

where \(H_{n}(y)\) are Hermite polynomials, we have :

$$ \xi _{n}(y) =2^{-\frac{n}{2}}e^{-\frac{2n+1}{2\sigma ^{2}_{x}}y^{2}}H_{n}\Big (\frac{\sqrt{2n+1}}{\sigma _{x}}y\Big ) $$

i.e.

$$\begin{aligned} \psi _{n}(y) = 2^{-\frac{n}{2}}e^{-\frac{n+1}{\sigma ^{2}_{x}}y^{2}}H_{n}\Big (\frac{\sqrt{2n+1}}{\sigma _{x}}y\Big ) \end{aligned}$$
(15)

Solutions plotted in Fig. 6 very well approximate Gabor functions.

Fig. 6
figure 6

Examples of odd and even solutions of the differential equation (12)

Remark: The solution in Eq. (15) is also an approximate solution for any template spectrum such that

$$ \omega ^{2} t^{\circledast }(\omega )= const + O(\omega ) $$

This is important since it show how the solutions are robust to small changes of the power spectrum of the natural images.

Aperture Ratio

Using Eq. (15) the ratio between the width of the Gaussian aperture and that of the eigenfunctions can be calculated as

$$\begin{aligned} \frac{\sigma _{x}}{\sigma _{\psi _{n}}}= \sqrt{n+1} \end{aligned}$$
(16)

If we consider the first eigenfunction the ratio is \(\sqrt{2}\).

Oscillating Behavior of Solutions

Although there isn’t an explicit expression for the frequency of the oscillation part of \(\psi _{n}\) in Eq. (15) we can use the following approximation which calculates the frequency from the first crossing points of \(H_{n}(x)\), i.e. \(\pm \sqrt{2n+1}\), Boyd (1984). The oscillating part of (15) can be therefore written, in this approximation as

$$ \cos \big (\frac{2n+1}{\sigma _{x}}y-n\frac{\pi }{2}\big ) $$

which gives \(\omega _{n} =(2n+1)/\sigma _{x}\). Using \(\omega _{n}=2\pi /\lambda _{n}\) and (16) we finally have

$$ \frac{\lambda _{n}}{\sigma _{\psi _{n}}}=\frac{2\pi \sqrt{n+1}}{2n+1}. $$

The above equation gives an important information: for any fixed eigenfunction, \(n=\bar{n}\) the number of oscillations under the gaussian envelope is constant.

Equivalence of the Integral and Differential Equations

To prove that the solutions of the eigenproblem (9) are equivalent to those of (13) we start with the observation that in the case of natural images power spectrum we can write explicitly (3):

$$\begin{aligned} t^{\circledast }(y-x)=\frac{1}{\sqrt{2\pi }}\int d\omega \;t^{\circledast }(\omega )e^{i\omega (y-x)}=\frac{1}{\sqrt{2\pi }}\int \;d\omega \;\frac{e^{i\omega (y-x)}}{\omega ^{2}}= -\sqrt{\frac{\pi }{2}}|y-x|. \end{aligned}$$
(17)

The integral equation (9) can be written for for \(a>0,\;a\rightarrow \infty \)

$$ \xi (y)+c\int _{-a}^{a}e^{-\alpha x^{2}}|y-x|\xi (x)dx $$

where for simplicity we dropped the index n and \(c=\sqrt{\pi }/(\sqrt{2}\nu )\). Removing the modulus

$$ \xi (y)+c\int _{-a}^{y}e^{-\alpha x^{2}}(y-x)\xi (x)dx + c\int _{y}^{a}e^{\alpha x^{2}}(x-y)\xi (x)dx. $$

Putting \(y=a,\;y=-a\) we can derive the boundary conditions

$$\begin{aligned}&\xi (a)+c\int _{-a}^{a}e^{-\alpha x^{2}}(a-x)\xi (x)dx\\&\xi (-a)+c\int _{-a}^{a}e^{-\alpha x^{2}}(a+x)\xi (x)dx \end{aligned}$$

Substituting, using the differential equation, \(e^{-\alpha x^{2}}\xi (x)\) with \(-\xi ''(x)\nu /2\pi \) and integrating by parts we have

$$\begin{aligned}&\xi (a)+ \xi (-a)+2ac'\xi '(-a) = 0\\&\xi (-a)+ \xi (-a)-2ac'\xi '(a) = 0 \end{aligned}$$

where \(c'=1/2\sqrt{2}\). The above two boundary conditions together with the differential equation are equivalent to the initial integral eigenproblem. If we want bounded solutions at infinity: \(\xi (\infty )=\xi (-\infty )=\xi '(\infty )=\xi '(-\infty )=0\).

1.4 1.4 Motion Determines a Consistent Orientation of the Gabor-Like Eigenfunctions

Consider a 2D image moving through time t, \(I(x(t),y(t))=I(\mathbf{{x}}(t))\) filtered, as in pipeline of Fig. 4, by a spatial low-pass filter and a band-pass filter and call the output \(f(\mathbf{{x}}(t))\).

Suppose now a temporal filter is done by a high-pass impulse response h(t). For example, let \(h(t) \sim \frac{d}{dt}\). We consider the effect of the time derivative over the translated signal, \(\mathbf{x}(t)=\mathbf{x}-\mathbf{v}t\) where \(\mathbf {v}\in \mathbb {R}^2\) is the velocity vector

$$\begin{aligned} {\frac{{\text {d}}f(\mathbf{{x}}(t))}{{\text {d}}t}} = \nabla f (\mathbf{{x}}(t)) \cdot \mathbf {v}. \end{aligned}$$
(18)

If, for instance, the direction of motion is along the x axis with constant velocity, \(\mathbf {v}=(v_{x},0)\), then Eq. (18) become

$$ \frac{{\text {d}} f (\mathbf{{x}}(t)) }{{\text {d}}t}= \frac{\partial f(\mathbf{{x}}(t))}{\partial x} v_{x}, $$

or, in Fourier domain of spatial and temporal frequencies:

$$\begin{aligned} \widehat{f}(i {\omega _t}) = i v_{x}\omega _{x}\hat{f}. \end{aligned}$$
(19)

Consider now an image I with a symmetric spectrum \(1/(\sqrt{\omega ^{2}_{x}+\omega ^{2}_{y}})\). Equation (19) shows that the effect of the time derivative is to break the radial symmetry of the spectrum in the direction of motion (depending on the value of \(v_{x}\)). Intuitively, spatial frequencies in the x direction are enhanced. Thus motion effectively selects a specific orientation since it enhances the frequencies orthogonal to the direction of motion in Eq. (1).

Thus the theory suggests that motion effectively “selects” the direction of the Gabor-like function (see previous section) during the emergence and maintenance of a simple cell tuning. It turns out that in addition to orientation other features of the eigenvectors are shaped by motion during learning. This is shown by an equivalent simulation to that presented in Fig. 1 but in which the order of frames was scrambled before the time derivative stage. The receptive fields are still Gabor-like functions but lack the important property of having \(\sigma _x \propto \lambda \). This is summarized in Fig. 7.

Fig. 7
figure 7

Summary plots for 2D simulations of V1 cells trained according to the pipeline described in Figs. 1 and 4 but scrambling the order of frames before the temporal derivative. Figures from top left to bottom right: sinusoid wavelength (\(\lambda \)) versus Gaussian aperture width (\(\sigma _\alpha \)); sinusoid wavelength (\(\lambda \)) versus Gaussian envelope width on the modulated direction (\(\sigma \)); Gaussian envelope width for the modulated direction (\(\sigma \)) versus Gaussian aperture width (\(\sigma _\alpha \)); ratio between sinusoid wavelength and Gaussian envelope width for the modulated direction (\(n_x\)) versus Gaussian aperture width (\(\sigma _\alpha \)); ratio between sinusoid wavelength and Gaussian envelope width on the unmodulated direction (\(n_y\)) versus ratio between sinusoid wavelength and Gaussian envelope width for the modulated direction (\(n_x\)). For an explanation of the details of this figure see Poggio et al. (2013)

The theory also predicts—assuming that the cortical equation provides a perfect description of Hebbian synapses—that the even eigenfunctions have slightly different \(n_x, n_y\) relations than odd eigenfunctions. It is unlikely that experiments data may allow to distinguish this small difference.

1.5 1.5 Phase of Gabor RFs

We do not analyze here the phase for a variety of reasons. The main reason is that phase measurements are rather variable in each species and across species. Phase is also difficult to measure. The general shape shows a peak in 0 and a higher peak at 90. These peaks are consistent with the \(n=2\) eigenfunctions (even) and the \(n=1\) eigenfunctions (odd) of Eq. 2 (the zero-th order eigenfunction is not included in the graph). The relative frequency of each peak would depend, according to our theory, on the dynamics (Oja equation) of learning and on the properties of the lateral inhibition between simple cells (to converge to eigenfunctions other than the first one). It is in any case interesting that the experimental data fit qualitatively our predictions: the \(\psi _1\) odd eigenfunction of the cortical equation should appear more often (because of its larger power) than the even \(\psi _2\) eigenfunction and no other ones with intermediate phases should exist—at least in the noise-less case (Fig. 8).

Fig. 8
figure 8

Data from Jones and Palmer (1987) (Cat), Ringach (2002) (Macaque) and Niell and Stryker (2008) (Mouse). Here zero phase indicates even symmetry for the Gabor like wavelet

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media Singapore

About this chapter

Cite this chapter

Mutch, J., Anselmi, F., Tacchetti, A., Rosasco, L., Leibo, J.Z., Poggio, T. (2017). Invariant Recognition Predicts Tuning of Neurons in Sensory Cortex. In: Zhao, Q. (eds) Computational and Cognitive Neuroscience of Vision. Cognitive Science and Technology. Springer, Singapore. https://doi.org/10.1007/978-981-10-0213-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-0213-7_5

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-0211-3

  • Online ISBN: 978-981-10-0213-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics