Automating the design of informative sequences of sensory stimuli


Adaptive stimulus design methods can potentially improve the efficiency of sensory neurophysiology experiments significantly; however, designing optimal stimulus sequences in real time remains a serious technical challenge. Here we describe two approximate methods for generating informative stimulus sequences: the first approach provides a fast method for scoring the informativeness of a batch of specific potential stimulus sequences, while the second method attempts to compute an optimal stimulus distribution from which the experimenter may easily sample. We apply these methods to single-neuron spike train data recorded from the auditory midbrain of zebra finches, and demonstrate that the resulting stimulus sequences do in fact provide more information about neuronal tuning in a shorter amount of time than do more standard experimental designs.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8


  1. 1.

    The careful reader will have noticed that Eq. (15) depends on the future responses r t + i, through the projected Fisher information D(r t + i, ρ t + i). However, recall that we are taking the expectation of Eq. (15), by plugging into Eq. (4), and the expectation of each term in Eq. (15) may be computed directly using the methods described above, without violating any causality constraints.

  2. 2.

    In Lewi et al. (2009), we discuss analytical methods for computing the optimal stimuli over sets of infinite cardinality, e.g., ellipsoidal sets of bounded squared norm in stimulus space. These methods relied strongly on the one-dimensional nature of the projected stimulus ρ t and are unfortunately not applicable in the b > 1 case, where the projected stimulus is b-dimensional instead of just one-dimensional.

  3. 3.

    79 × 20 coefficients of the STRF + 8 spike history coefficients + 1 bias term = 1,589 unknown parameters.

  4. 4.

    If C t is low-rank then \(C_t^{-1}\) is infinite in some directions, and the derivation will not hold because the contribution of \(C_t^{-1}\) will not become negligible as b→ ∞. In this case we can simply use a truncated design: i.e., we maximize the information in directions for which our prior uncertainty is not zero. To accomplish this we simply project \({\ensuremath{\theta}}\) into the lower-dimensional space corresponding to the space spanned by non-zero eigenvectors of C t . Alternately, in the case that C t has some very small but positive eigenvalues, it may be possible to approach the full objective function directly, though we have not pursued this direction systematically.

  5. 5.

    It is also worth noting that the log term will typically be much smaller than the other terms when the stimulus dimension d s is large, since the first three terms in Eq. (25) scale linearly with d s .


  1. Anderson, M., & Micheli-Tzanakou, E. (1998). Computer-brain interaction to optimize auditory stimuli based on neuronal responses. In Bioengineering conference, 1998. Proceedings of the IEEE 24th Annual Northeast (pp. 18–20).

  2. Benda, J., Gollisch, T., Machens, C. K., & Herz, A. V. (2007). From response to stimulus: Adaptive sampling in sensory physiology. Current Opinion in Neurobiology, 17(4), 430–436.

    CAS  Article  PubMed  Google Scholar 

  3. Calabrese, A., Schumacher, J., Schneider, D., Woolley, S., & Paninski, L. (2010). A generalized linear model for estimating receptive fields from midbrain responses to natural sounds. In: Conference Abstract: Computational and Systems Neuroscience. doi:10.3389/conf.fnins.2010.03.00126

  4. Chaloner, K., & Verdinelli, I. (1995). Bayesian experimental design: A review. Statistical Science, 10(3), 273–304.

    Article  Google Scholar 

  5. Chen, Z., Becker, S., Bondy, J., Bruce, I. C., & Haykin, S. C. (2005). A novel model-based hearing compensation design using a gradient-free optimization method. Neural Computation, 17(12), 2648–2671.

    Article  PubMed  Google Scholar 

  6. Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. New York: Wiley.

    Book  Google Scholar 

  7. David, S., Mesgarani, N., & Shamma, S. (2007). Estimating sparse spectro-temporal receptive fields with natural stimuli. Network, 18, 191–212.

    Article  PubMed  Google Scholar 

  8. deCharms, R. C., Blake, D. T., & Merzenich, M. M. (1998). Optimizing sound features for cortical neurons. Science, 280(5368), 1439–1443.

    CAS  Article  PubMed  Google Scholar 

  9. Edin, F., Machens, C., Schutze, H., & Herz, A. (2004). Searching for optimal sensory signals: Iterative stimulus reconstruction in closed-loop experiments. Journal of Computational Neuroscience, 17(1), 47–56.

    Article  PubMed  Google Scholar 

  10. Fedorov, V. V. (1972). Theory of optimal experiments. New York: Academic Press.

    Google Scholar 

  11. Foldiak, P. (2001). Stimulus optimisation in primary visual cortex. Neurocomputing, 38–40, 1217–1222.

    Article  Google Scholar 

  12. Gill, P., Zhang, J., Woolley, S., Fremouw, T., & Theunissen, F. (2006). Sound representation methods for spectro-temporal receptive field estimation. Journal of Computational Neuroscience, 21, 5–20.

    Article  PubMed  Google Scholar 

  13. Hsu, A., Woolley, S. M. N., Fremouw, T. E., & Theunissen, F. E. (2004). Modulation power and phase spectrum of natural sounds enhance neural encoding performed by single auditory neurons. Journal of Neuroscience, 24(41), 9201–9211.

    CAS  Article  PubMed  Google Scholar 

  14. Kwon, W. H., & Han, S. (2005). Receding horizon control: Model predictive control for state models. New York: Springer.

    Google Scholar 

  15. Lewi, J., Butera, R., & Paninski, L. (2007). Efficient active learning with generalized linear models. AISTATS07.

  16. Lewi, J., Butera, R., Schneider, D. M., Woolley, S. M. N., & Paninski, L. (2008). Designing neurophysiology experiments to optimally constrain receptive field models along parametric submanifolds. NIPS, 945–952.

  17. Lewi, J., Butera, R., & Paninski, L. (2009). Sequential optimal design of neurophysiology experiments. Neural Computation, 21(3), 619–687.

    Article  PubMed  Google Scholar 

  18. Luczak, A., Bartho, P., Marguet, S., Buzsaki, G., & Harris, K. (2007). Sequential structure of neocortical spontaneous activity in vivo. PNAS, 104, 347–352.

    CAS  Article  PubMed  Google Scholar 

  19. Machens, C. (2002). Adaptive sampling by information maximization. Physical Review Letters, 88, 228104–228107.

    Article  PubMed  Google Scholar 

  20. Machens, C., Gollisch, T., Kolesnikova, O., & Herz, A. (2005). Testing the efficiency of sensory coding with optimal stimulus ensembles. Neuron, 47(3), 447–456.

    CAS  Article  PubMed  Google Scholar 

  21. Machens, C. K., Wehr, M., & Zador, A. M. (2003). Spectro-temporal receptive fields of subthreshold responses in auditory cortex. Advances in Neural Information Processing Systems 15, 133–140.

    Google Scholar 

  22. Mackay, D. J. C. (1992). Information-based objective functions for active data selection. Neural Computation, 4(4), 590–604.

    Article  Google Scholar 

  23. O’Connor, K. N., Petkov, C. I., & Sutter, M. L. (2005). Adaptive stimulus optimization for auditory cortical neurons. Journal of Neurophysiology, 94(6), 4051–4067.

    Article  PubMed  Google Scholar 

  24. Ohki, K., Chung, S., Ch’ng, Y., Kara, P., & Reid, C. (2005). Functional imaging with cellular resolution reveals precise micro-architecture in visual cortex. Nature, 433, 597–603.

    CAS  Article  PubMed  Google Scholar 

  25. Paninski, L. (2004). Maximum likelihood estimation of cascade point-process neural encoding models. Network: Computation in Neural Systems, 15, 243–262.

    Article  Google Scholar 

  26. Paninski, L. (2005). Asymptotic theory of information-theoretic experimental design. Neural Computation, 17(7), 1480–1507.

    Article  PubMed  Google Scholar 

  27. Paninski, L., Pillow, J., & Lewi, J. (2007). Statistical models for neural encoding, decoding, and optimal stimulus design. In P. Cisek, T. Drew, & J. Kalaska (Eds.), Computational neuroscience: Progress in brain research. New York: Elsevier.

    Google Scholar 

  28. Pillow, J., Shlens, J., Paninski, L., Sher, A., Litke, A., Chichilnisky, E., et al. (2008). Spatiotemporal correlations and visual signaling in a complete neuronal population. Nature, 454, 995–999.

    CAS  Article  PubMed  Google Scholar 

  29. Segev, R., Goodhouse, J., Puchalla, J., & Berry, M. J. (2004). Recording spikes from a large fraction of the ganglion cells in a retinal patch. Nature Neuroscience, 7(10), 1155–1162.

    Article  Google Scholar 

  30. Singh, N. C., & Theunissen, F. E. (2003). Modulation spectra of natural sounds and ethological theories of auditory processing. The Journal of the Acoustical Society of America, 114(6), 3394–3411.

    Article  PubMed  Google Scholar 

  31. Smyth, D., Willmore, B., Baker, G., Thompson, I., & Tolhurst, D. (2003). The receptive-field organization of simple cells in primary visual cortex of ferrets under natural scene stimulation. Journal of Neuroscience, 23, 4746–4759.

    CAS  PubMed  Google Scholar 

  32. Theunissen, F. E., David, S. V., Singh, N. C., Hsu, A., Vinje, W. E., & Gallant, J. L. (2001). Estimating spatio-temporal receptive fields of auditory and visual neurons from their responses to natural stimuli. Network-Computation in Neural Systems, 12(3), 289–316.

    CAS  Google Scholar 

  33. Theunissen, F. E., Sen, K., & Doupe, A. J. (2000). Spectral-temporal receptive fields of nonlinear auditory neurons obtained using natural sounds. Journal of Neuroscience, 20(6), 2315–2331.

    CAS  PubMed  Google Scholar 

  34. Theunissen, F. E., Woolley, S. M. N., Hsu, A., & Fremouw, T. (2004). Methods for the analysis of auditory processing in the brain. Annals of the New York Academy of Sciences, 1016, 187–207.

    Article  PubMed  Google Scholar 

  35. Tzanakou, E., Michalak, R., & Harth, E. (1979). The alopex process: Visual receptive fields by response feedback. Biological Cybernetics, 35, 161–174.

    CAS  Article  PubMed  Google Scholar 

  36. Woolley, S. M., Gill, P. R., & Theunissen, F. E. (2006). Stimulus-dependent auditory tuning results in synchronous population coding of vocalizations in the songbird midbrain. The Journal of Neuroscience, 26, 2499–2512.

    CAS  Article  PubMed  Google Scholar 

  37. Woolley, S. M. N., & Casseday, J. H. (2004). Response properties of single neurons in the zebra finch auditory midbrain: Response patterns, frequency coding, intensity coding, and spike latencies. Journal of Neurophysiology, 91(1), 136–151.

    Article  PubMed  Google Scholar 

  38. Woolley, S. M. N., & Casseday, J. H. (2005). Processing of modulated sounds in the zebra finch auditory midbrain: Responses to noise, frequency sweeps, and sinusoidal amplitude modulations. Journal of Neurophysiology, 94(2), 1143–1157.

    Article  PubMed  Google Scholar 

  39. Yamane, Y., Carlson, E., Bowman, K., Wang, Z., & Connor, C. E. (2008). A neural code for three-dimensional object shape in macaque inferotemporal cortex. Nature Neuroscience, 11, 1352–1360.

    CAS  Article  PubMed  Google Scholar 

Download references


We thank A. Ramirez, C. Rozell, and the anonymous referees for very helpful comments on the manuscript. JL was supported by the Computational Science Graduate Fellowship Program administered by the DOE under contract DE-FG02-97ER25308. DMS is supported by a Gatsby Initiative in Brain Circuitry fellowship, and by NIH Predoctoral Fellowship F31DC010301. SW is supported by the Searle Scholar’s Fund and by NIH grant 1R01DC009810. LP is supported by a McKnight Scholar award and an NSF CAREER award. LP and SW were jointly supported by a Gatsby Initiative in Brain Circuitry pilot grant. Description of some early preliminary results appeared in conference proceedings (Lewi et al. 2008).

Author information



Corresponding author

Correspondence to Liam Paninski.

Additional information

Action Editor: Alexander G. Dimitrov


Appendix A: Using a frequency representation to smooth the STRF

To represent the STRF in the Fourier domain, we applied the Fourier transform separately to the spectral and temporal dimensions of the STRF. Applying the separable Fourier transform to the STRF is just a linear transformation. This transformation maps the STRF into a coordinate system in which the basis functions are rank one matrices. Each of these matrices is the product of 1-dimensional sine-waves in the spectral and temporal directions of the STRF. Using these basis functions we can write the STRF such that each row and column of the STRF is a linear combination of 1-d sine-waves,

$$\begin{array}{lll} \theta(i,j)&=&\sum\limits_{\alpha=1}^{{\ensuremath{m_f}}}\sum\limits_{\beta=1}^{{\ensuremath{m_t}}} {\ensuremath{\gamma}}^1_{\alpha,\beta}\sin(2\pi\cdot{\ensuremath{f_{o,f}}}\cdot\alpha\cdot{}i) \\ &&\times\sin(2\pi\cdot{\ensuremath{f_{o,t}}}\cdot\beta\cdot{}j) \end{array} $$
$$\begin{array}{lll} &&+\sum\limits_{\alpha=1}^{{\ensuremath{m_f}}}\sum\limits_{\beta=0}^{{\ensuremath{m_t}}} {\ensuremath{\gamma}}^2_{\alpha,\beta}\sin(2\pi\cdot{\ensuremath{f_{o,f}}}\cdot\alpha\cdot{}i)\\ &&\times\cos(2\pi\cdot{\ensuremath{f_{o,t}}}\cdot\beta\cdot{}j)\\ \end{array} $$
$$\begin{array}{lll} &&+ \sum\limits_{\alpha=0}^{{\ensuremath{m_f}}}\sum\limits_{\beta=1}^{{\ensuremath{m_t}}} {\ensuremath{\gamma}}^3_{\alpha,\beta}\cos(2\pi\cdot{\ensuremath{f_{o,f}}}\cdot\alpha\cdot{}i) \\ &&\times\sin(2\pi\cdot{\ensuremath{f_{o,t}}}\cdot\beta\cdot{}j) \\ \end{array} $$
$$\begin{array}{lll} &&+\sum\limits_{\alpha=0}^{{\ensuremath{m_f}}}\sum\limits_{\beta=0}^{{\ensuremath{m_t}}} {\ensuremath{\gamma}}^4_{\alpha,\beta}\cos(2\pi\cdot{\ensuremath{f_{o,f}}}\cdot\alpha\cdot{}i)\\ &&\times \cos(2\pi\cdot{\ensuremath{f_{o,t}}}\cdot\beta\cdot{}j). \end{array} $$

The functions \(\sin(2\pi\cdot{\ensuremath{f_{o,f}}}\cdot\alpha\cdot{}i)\) and \(\cos(2\pi\cdot{\ensuremath{f_{o,f}}}\cdot\alpha\cdot{}i)\) determine how each basis function varies across the spectral dimension of the STRF while the functions \(\sin(2\pi\cdot{\ensuremath{f_{o,t}}}\cdot\beta\cdot{}j)\) and \(\cos(2\pi\cdot{\ensuremath{f_{o,t}}}\cdot\beta\cdot{}j)\) determine how the basis functions vary across time in the STRF. Each pair of sine-waves measures the amount of energy at particular frequencies in the spectral and temporal dimensions. The amplitude of each frequency is determined by the coefficients \({\ensuremath{\gamma}}^i_{{\ensuremath{\alpha}},{\ensuremath{\beta}}}\). To form an orthogonal basis for the STRF we need to project the STRF onto sinusoids with frequencies

$$\{0,{\ensuremath{f_{o,f}}},2{\ensuremath{f_{o,f}}},\ldots {\ensuremath{m_f}}{\ensuremath{f_{o,f}}}\} \quad \{0,{\ensuremath{f_{o,t}}},2{\ensuremath{f_{o,t}}},\ldots,{\ensuremath{m_t}}{\ensuremath{f_{o,t}}}\} $$
$${\ensuremath{f_{o,f}}}=\frac{1}{{\ensuremath{n_f}}} \quad {\ensuremath{f_{o,t}}}=\frac{1}{{\ensuremath{n_t}}} \\ $$
$${\ensuremath{m_f}}=\left\lceil\frac{1}{2{\ensuremath{f_{o,f}}}}-1\right\rceil \quad {\ensuremath{m_t}}=\left\lceil\frac{1}{2{\ensuremath{f_{o,t}}}}-1\right\rceil; $$

\({\ensuremath{f_{o,f}}}\) and \({\ensuremath{f_{o,t}}}\) are the fundamental frequencies and are set so that one period corresponds to the dimensions of the STRF (\({\ensuremath{n_t}}\) and \({\ensuremath{n_f}}\) denote the dimensions of the STRF in the time and frequency dimensions, respectively), and \({\ensuremath{m_f}}\) and \({\ensuremath{m_t}}\) are the largest integers such that \({\ensuremath{m_f}}{\ensuremath{f_{o,f}}}\) and \({\ensuremath{m_t}}{\ensuremath{f_{o,t}}}\) are less than the Nyquist frequency. We subtract 1 and take the ceiling to make sure the frequencies of our basis functions are less than the Nyquist frequency. The unknown parameters in this new coordinate system are the amplitudes, \({\ensuremath{\vec{\gamma}}}=\{{\ensuremath{\gamma}}^1_{\alpha,\beta},{\ensuremath{\gamma}}^2_{\alpha,\beta}, {\ensuremath{\gamma}}^3_{\alpha,\beta},{\ensuremath{\gamma}}^4_{\alpha,\beta}\}\). For simplicity, we will continue to refer to the unknown parameters as θ, realizing that the STRF is represented using this new basis. Since this transformation is linear we can continue to apply our methods for fitting the GLM and optimizing the stimuli.

To low pass filter the STRF we can simply force the coefficients of \({\ensuremath{\theta}}\) corresponding to high frequencies to zero; i.e we pick cutoffs \({\ensuremath{n_{tc}}}\) and \({\ensuremath{n_{fc}}}\) for the time and spectral directions respectively and set

$${\ensuremath{\gamma}}^i_{{\ensuremath{\alpha}},{\ensuremath{\beta}}}=0 \quad \textrm{i}f \,\, \alpha > {\ensuremath{n_{fc}}} \,\, \textrm{or} \,\, \beta > {\ensuremath{n_{tc}}}. $$

Decreasing the cutoff frequencies not only makes the estimated STRFs smoother, it also reduces the dimensionality of the model. Reducing the dimensionality makes it easier to fit the GLM and optimize the stimuli, but the risk is that the lower-dimensional model may be too simple to adequately model auditory neurons. We can mitigate this risk by using a soft cutoff. Rather than force all high-frequency components to zero, we can adjust our prior to reflect our strong belief that high frequencies should have little energy; we simply set the prior mean of these coefficients to zero and decrease their prior variance. If we now estimate the STRF using the maximum of the posterior then the amplitudes at high frequencies will be biased by our prior towards zero. However, given sufficient evidence the posterior mean will yield non-zero estimates for the amplitudes of high frequencies. See Theunissen et al. (2001) for details and David et al. (2007), Calabrese et al. (2010) for further discussion.

We chose to impose a hard cutoff because we wanted to reduce the dimensionality to make online estimation of the model and online optimization of the stimuli more tractable. To pick the cutoff frequencies, we picked a single neuron and estimated the STRF using maximum-likelihood for a variety of cutoff frequencies. We evaluated the quality of each model by computing the log-likelihood of the bird’s responses to inputs in a test set. The test set consisted of one bird song and one ml-noise stimulus which were not used to train the models. We chose the cutoff frequencies to be \({\ensuremath{n_{fc}}}=10\) and \({\ensuremath{n_{tc}}}=4\) because these values provided good predictive performance for both the bird song and ml-noise while keeping the number of unknown parameters tractable (in this case the STRF has 189 unknown parameters).

Appendix B: Computing the average information for a Gaussian process

In this section we show how the average information per stimulus,

$${\ensuremath{E_{{\ensuremath{\theta}}}}} \log \left| E_{s} \exp(s^T{\ensuremath{\theta}})ss ^T \right|, $$

can be computed when the input distribution is a Gaussian process. For the GLM the expected Fisher information matrix \({\ensuremath{E_{s}}}J_{\rm exp}(s,{\ensuremath{\theta}})\) has a simple 1-dimensional dependence on \({\ensuremath{\theta}}\),

$$J_{\rm exp}(s,{\ensuremath{\theta}})=J_{\rm exp}(s^T{\ensuremath{\theta}})ss^T $$
$$J_{\rm exp}(s^T{\ensuremath{\theta}})=-E_{r} \frac{\partial^2 \log p(r|\rho=s^T{\ensuremath{\theta}})}{\partial \rho^2} ss^T \\ $$
$$ =J_{\rm exp}(\rho=s^T{\ensuremath{\theta}})ss^T . $$

This 1-dimensional structure along with the fact that p(s) is Gaussian makes computing the expectations tractable. We start by defining a new coordinate system in which the first axis is aligned with \({\ensuremath{\theta}}\). This coordinate system is defined by the orthonormal matrix, \({\mathcal{R}_{\theta}}\). The first column of \({\mathcal{R}_{\theta}}\) is \(\frac{{\ensuremath{\theta}}}{{\ensuremath{||{{\ensuremath{\theta}}}||_2}}}\) and the remaining columns are a suitable set of orthonormal vectors. We can thus define the transformation of s and \({\ensuremath{\theta}}\) into this new coordinate system,

$${\ensuremath{\theta^{'}}}={\mathcal{R}_{\theta}}^T{\ensuremath{\theta}} $$
$${\ensuremath{\vec{w}}}={\mathcal{R}_{\theta}}^T s. $$

This coordinate system has the convenient properties

$${\ensuremath{\theta^{'}_{i}}}=0 \quad \forall i\neq1 \\ $$
$${\Rightarrow}{\ensuremath{\vec{w}}}{}^T{\ensuremath{\theta^{'}}} = w_{1}{\ensuremath{\theta^{'}_{1}}}.$$

We can now rewrite our objective function

$${\ensuremath{\mathcal{F}}}(p(s))={\ensuremath{E_{{\ensuremath{\theta}}}}}\log| {\ensuremath{E_{p(s)}}} J_{\rm exp}(\rho)ss^T| $$
$$= {\ensuremath{E_{{\ensuremath{\theta}}}}}\log| {\ensuremath{E_{p({\ensuremath{\vec{w}}})}}} J_{\rm exp}(w_{1}{\ensuremath{\theta^{'}_{1}}}){\ensuremath{\vec{w}}}{\ensuremath{\vec{w}}}{}^T| $$
$$\label{eqn:optfun} ={\ensuremath{E_{{\ensuremath{\theta}}}}}\log| {\ensuremath{E_{{w_{1}}}}} J_{\rm exp}(w_{1}{\ensuremath{\theta^{'}_{1}}}){\ensuremath{E_{{{w_2}},\ldots,{w_{\rm{dim}(s)}|w_{1}}}}}{\ensuremath{\vec{w}}}{\ensuremath{\vec{w}}}{}^T|. $$

Since p(s) is Gaussian and \({\ensuremath{\vec{w}}}={\mathcal{R}_{\theta}}^Ts\), \(p({\ensuremath{\vec{w}}})\) is Gaussian with mean \({\mathcal{R}_{\theta}}^T {\mu_s}\) and covariance matrix \({\mathcal{R}_{\theta}}^T {C_s} {\mathcal{R}_{\theta}}^T\). Consequently, \(p({\ensuremath{\vec{w}}}|w_{1})\) is also Gaussian and can be computed using the standard Gaussian conditioning formulas,

$$\begin{array}{lll} p({\ensuremath{\vec{w}}}|w_{1})&=&\mathcal{N}({\mathcal{R}_{\theta}}^T{\mu_s}+\frac{1}{{\ensuremath{\sigma^2_{\omega_1}}}}{\mathcal{R}_{\theta}}^T{\ensuremath{\boldsymbol{\gamma}}}(w_{1}-{\ensuremath{\mu_{\omega_1}}}),\\ &&{\mathcal{R}_{\theta}}^T{C_s}{\mathcal{R}_{\theta}}-\frac{1}{{\ensuremath{\sigma^2_{\omega_1}}}}{\mathcal{R}_{\theta}}^T{\ensuremath{\boldsymbol{\gamma}}}{\ensuremath{\boldsymbol{\gamma}}}^T{\mathcal{R}_{\theta}}) \end{array} $$
$${\ensuremath{\mu_{\omega_1}}}=\frac{{\ensuremath{\theta}}^T}{{\ensuremath{||{{\ensuremath{\theta}}}||_2}}}{\mu_s} $$
$${\ensuremath{\sigma^2_{\omega_1}}}=\frac{{\ensuremath{\theta}}^T}{{\ensuremath{||{{\ensuremath{\theta}}}||_2}}}{C_s} \frac{{\ensuremath{\theta}}}{{\ensuremath{||{{\ensuremath{\theta}}}||_2}}} \\ $$
$${\ensuremath{\boldsymbol{\gamma}}}= {C_s} \frac{{\ensuremath{\theta}}}{{\ensuremath{||{{\ensuremath{\theta}}}||_2}}}. $$

Using this distribution we can easily compute the conditional expectation,

$$\begin{array}{lll} &&{\kern-12pt}{\ensuremath{E_{{{\ensuremath{\vec{w}}}|w_{1}}}}} {\ensuremath{\vec{w}}}{\ensuremath{\vec{w}}}^T \\ &&= {\mathcal{R}_{\theta}}^T\left({C_s} - \frac{1}{{\ensuremath{\sigma^2_{\omega_1}}}}{\ensuremath{\boldsymbol{\gamma}}}{\ensuremath{\boldsymbol{\gamma}}}^T+\left({\mu_s}+\frac{1}{{\ensuremath{\sigma^2_{\omega_1}}}}{\ensuremath{\boldsymbol{\gamma}}}(w_{1}-{\ensuremath{\mu_{\omega_1}}})\right)\vphantom{\left(\frac{1}{{\ensuremath{\sigma^2_{\omega_1}}}}\right)^T}\right. \\&&{\kern3pc}\quad\times\left.\left({\mu_s}+\frac{1}{{\ensuremath{\sigma^2_{\omega_1}}}}{\ensuremath{\boldsymbol{\gamma}}}(w_{1}-{\ensuremath{\mu_{\omega_1}}})\right)^T \right){\mathcal{R}_{\theta}}\\ \end{array} $$
$$\begin{array}{lll} &&={\mathcal{R}_{\theta}}^T\left({C_s} +\left({\mu_s}+\frac{1}{{\ensuremath{\sigma^2_{\omega_1}}}}{\ensuremath{\boldsymbol{\gamma}}}(w_{1}-{\ensuremath{\mu_{\omega_1}}})-\frac{1}{\sqrt{{\ensuremath{\sigma^2_{\omega_1}}}}}{\ensuremath{\boldsymbol{\gamma}}}\right)\vphantom{\left(\frac{1}{\sqrt{{\ensuremath{\sigma^2_{\omega_1}}}}}\right)^T}\right. \\ &&\quad{\kern3pc}\times\left.\left({\mu_s}+\frac{1}{{\ensuremath{\sigma^2_{\omega_1}}}}{\ensuremath{\boldsymbol{\gamma}}}(w_{1}-{\ensuremath{\mu_{\omega_1}}})+\frac{1}{\sqrt{{\ensuremath{\sigma^2_{\omega_1}}}}}{\ensuremath{\boldsymbol{\gamma}}}\right)^T \right){\mathcal{R}_{\theta}}\\ \\ \end{array} $$
$$ ={\mathcal{R}_{\theta}}^T\left({C_s} + \left({\ensuremath{\vec{\kappa}}{w}}_{1}+{\ensuremath{\vec{\delta}}}\right)\left({\ensuremath{\vec{\kappa}}{w}}_{1}+{\ensuremath{\vec{\eta}}}\right)^T\right){\mathcal{R}_{\theta}} $$
$${\ensuremath{\vec{\kappa}}}=\frac{{\ensuremath{\boldsymbol{\gamma}}}}{{\ensuremath{\sigma^2_{\omega_1}}}}\\ $$
$${\ensuremath{\vec{\delta}}}={\mu_s}-\frac{{\ensuremath{\boldsymbol{\gamma}}}}{{\ensuremath{\sigma^2_{\omega_1}}}}{\ensuremath{\mu_{\omega_1}}}-\frac{{\ensuremath{\boldsymbol{\gamma}}}}{\sqrt{{\ensuremath{\sigma^2_{\omega_1}}}}}\\ $$
$${\ensuremath{\vec{\eta}}}={\mu_s}-\frac{{\ensuremath{\boldsymbol{\gamma}}}}{{\ensuremath{\sigma^2_{\omega_1}}}}{\ensuremath{\mu_{\omega_1}}}+\frac{{\ensuremath{\boldsymbol{\gamma}}}}{\sqrt{{\ensuremath{\sigma^2_{\omega_1}}}}}\\ $$

The key point is that the expected value is just a rank-1 perturbation of a rotated C s . We can now evaluate the expectation over w 1,

$$\begin{array}{lll} &&{\kern-20pt}{\ensuremath{E_{{w_{1}}}}}J_{\rm exp}\left(w_{1}{\ensuremath{\theta^{'}_{1}}}\right){\ensuremath{E_{{{\ensuremath{\vec{w}}}|w_{1}}}}} {\ensuremath{\vec{w}}}{\ensuremath{\vec{w}}}^T\\ &&\;={\mathcal{R}_{\theta}}^T\left( {C_s}{\ensuremath{\varpi_1}}+ {\ensuremath{\varpi_3}}\left[\left({\ensuremath{\vec{\kappa}}}+\frac{{\ensuremath{\varpi_2}}}{{\ensuremath{\varpi_3}}}{\ensuremath{\vec{\delta}}}\right)\left({\ensuremath{\vec{\kappa}}}+\frac{{\ensuremath{\varpi_2}}}{{\ensuremath{\varpi_3}}}{\ensuremath{\vec{\eta}}}\right)^T\right.\right.\\ &&\qquad\qquad\qquad\qquad\qquad\left.\left. +\left(\frac{{\ensuremath{\varpi_1}}}{{\ensuremath{\varpi_3}}}-\left(\frac{{\ensuremath{\varpi_2}}}{{\ensuremath{\varpi_3}}}\right)^2\right){\ensuremath{\vec{\delta}}}{\ensuremath{\vec{\eta}}}^T\right] \right){\mathcal{R}_{\theta}}\\ \label{eqn:ejww} \\ \end{array} $$
$${\ensuremath{\varpi_1}}={\ensuremath{E_{{w_{1}}}}} J_{\rm exp}(w_{1}{\ensuremath{\theta^{'}_{1}}})\\ $$
$${\ensuremath{\varpi_2}}={\ensuremath{E_{{w_{1}}}}} J_{\rm exp}(w_{1}{\ensuremath{\theta^{'}_{1}}}) w_{1} \\ $$
$${\ensuremath{\varpi_2}}={\ensuremath{E_{{w_{1}}}}} J_{\rm exp}(w_{1}{\ensuremath{\theta^{'}_{1}}}) w_{1}^2; $$

\(w_{1}=\frac{{\ensuremath{\theta}}^T}{{\ensuremath{||{{\ensuremath{\theta}}}||_2}}}s\), and p(w 1) is Gaussian with mean and variance \(({\ensuremath{\mu_{\omega_1}}},{\ensuremath{\sigma^2_{\omega_1}}})\). The above are just 1-dimensional expectations so for any value of \({\ensuremath{\theta}}\) we could compute them numerically.

Equation (59) is a rank 2 update of C s . Therefore we can use the matrix determinant lemma to compute \(|{\ensuremath{E_{{w_{1}}}}}{\ensuremath{E_{{{\ensuremath{\vec{w}}}|w_{1}}}}} J_{\rm exp} {\ensuremath{\vec{w}}}{\ensuremath{\vec{w}}}^T|\),

$$\begin{array}{lll} &&{\kern-16pt}\log|{\ensuremath{E_{{w_{1}}}}}{\ensuremath{E_{{{\ensuremath{\vec{w}}}|w_{1}}}}} J_{\rm exp}(w_{1}{\ensuremath{\theta^{'}_{1}}}) {\ensuremath{\vec{w}}}{\ensuremath{\vec{w}}}^T| \\ &&=\rm{dim}({C_s})\log {\ensuremath{\varpi_1}} + \log | I + V^T({\ensuremath{\varpi_1}}{C_s})^{-1}U| \\ &&{\kern.8pc} +\log |{C_s}| \end{array} $$
$$U={\ensuremath{\varpi_3}} \left[\left({\ensuremath{\vec{\kappa}}}+\frac{{\ensuremath{\varpi_2}}}{{\ensuremath{\varpi_3}}}{\ensuremath{\vec{\delta}}}\right), \left(\frac{{\ensuremath{\varpi_1}}}{{\ensuremath{\varpi_3}}}-\left(\frac{{\ensuremath{\varpi_2}}}{{\ensuremath{\varpi_3}}}\right)^2\right){\ensuremath{\vec{\delta}}}\right]\\ $$
$$V=\left[\left({\ensuremath{\vec{\kappa}}}+\frac{{\ensuremath{\varpi_2}}}{{\ensuremath{\varpi_3}}}{\ensuremath{\vec{\eta}}}\right), {\ensuremath{\vec{\eta}}}\right]. $$

Since \( I + V^T({\ensuremath{\varpi_1}}{C_s})^{-1}U\) is a 2-d matrix, we can compute its determinant analytically. Taking the expectation with respect to \({\ensuremath{\theta}}\) yields,

$$\begin{array}{lll} &&{\kern-6pt}{\ensuremath{E_{{\ensuremath{\theta}}}}}\log|{\ensuremath{E_{{w_{1}}}}}{\ensuremath{E_{{{\ensuremath{\vec{w}}}|w_{1}}}}} J_{\rm exp}(w_{1}{\ensuremath{\theta^{'}_{1}}}) {\ensuremath{\vec{w}}}{\ensuremath{\vec{w}}}^T| \\ &&{\kern6pt}=\rm{dim}({C_s}){\ensuremath{E_{{\ensuremath{\theta}}}}}\log {\ensuremath{\varpi_1}} + {\ensuremath{E_{{\ensuremath{\theta}}}}}\log | I \\ &&{\kern15pt}+ V^T({\ensuremath{\varpi_1}}{C_s})^{-1}U|+\log |{C_s}|. \end{array} $$

Appendix C: Solving the optimization problem described in Section 4.1.3

Our goal is to solve the optimization problem

$$ {\rm{arg}}\max\limits_{{\mu_s},{C_s}} d{\ensuremath{\vec{\mu}_{t}}}^T{\mu_s}+\frac{d}{2}Tr({C_s}{}R)+ \log|{C_s}|, $$

where we have abbreviated

$$ R={\ensuremath{\vec{\mu}_{t}}}{\ensuremath{\vec{\mu}_{t}}}^T+C_t, $$

over all (μ s ,C s ) subject to the constraints

$$s^T{C_s}s > 0 \quad \forall s\neq0 $$
$$Tr({C_s})< m- ||{\mu_s}||^2, $$

where m is the maximum allowed average stimulus power. Clearly the optimal μ s will be parallel to \({\ensuremath{\vec{\mu}_{t}}}\). Therefore, only the magnitude of the optimal μ s is unknown. We can therefore rewrite the objective function as

$${\rm{arg}}\max\limits_{||{\mu_s}||} \left[ d ||{\ensuremath{\vec{\mu}_{t}}}||||{\mu_s}|| +{\rm{arg}}\max_{{C_s}} \left(\frac{d}{2}Tr({C_s}{}R)+ \log|{C_s}| \right) \right]$$

We rewrite the inner problem to make the dependence on ||μ s || (through the power constraint) more explicit:

$$ {\rm{arg}}\max\limits_{{C_s}}\frac{d}{2}Tr({C_s}{}R)+ \log|{C_s}| $$
$$s.t \quad s^T{C_s}s > 0 \quad \forall s\neq0 \\ $$
$$Tr({C_s})< m- ||{\mu_s}||^2. $$

We solve this optimization problem by introducing a Lagrange multiplier,

$$ L=\frac{d}{2}Tr({C_s}{}R)+\log|{C_s}| - \lambda Tr({C_s})$$
$$=Tr \left[ {C_s}{} \left( \frac{d}{2} (R-\lambda{}I ) \right) \right] + \log|{C_s}|.$$

This Lagrangian is exactly isomorphic to twice the log-likelihood in a multivariate Gaussian model with zero mean, if we interpret C s as the inverse covariance matrix in the Gaussian model and \(- \frac{d}{2} (R-\lambda{}I )\) as the observed sample covariance matrix. Standard arguments involving a change of basis now imply that the optimal C s is given by

$$ {C_s} = -\left( \frac{d}{2} (R-\lambda{}I ) \right)^{-1}, \label{eq:cs} $$

for any λ >  max i {r i }, where r i denotes the i-th eigenvalue of R. This condition on λ is required to ensure that the resulting C s maximizes the Lagrangian L, and guarantees that C s is positive definite.

We now solve for λ by plugging this C s into our power constraint:

$$ Tr({C_s})=\frac{2}{d}\sum_i \frac{1}{\lambda-r_i} = m - ||{\mu_s}||^2 \label{eq:lambda} $$

We can easily solve this equation numerically on the allowed range λ >  max i r i to compute λ as a function of ||μ s ||. We can then in principle do a search over all ||μ s || to find the optimal value (μ s ,C s ). In fact, a more efficient method is to instead just compute the optimal (||μ s ||,C s ) for each value of λ; thus, a single 1-d search over λ is guaranteed to find the optimal (μ s ,C s ). Also note that we can compute the inverse in Eq. (77) efficiently for any value of λ by computing the eigendecomposition of R once and then using the fact that eig(R − λI) = eig(R) − λ; we used this formula already in Eq. (77).

Enforcing stationarity by incorporating Toeplitz constraints

It is worth noting, in the case of stimulus filters θ that extend over more than one time bin (i.e., n t as defined in Appendix A is greater than one), that the stimulus sequence drawn from the Gaussian distribution defined above will not be temporally stationary. Instead, the stimulus sequence will consist of a series of appended n t -long segments of draws from a Gaussian distribution, and therefore the marginal distribution of the inputs s t will in general be an n t -mixture of Gaussians, instead of a single Gaussian distribution. We recover a single Gaussian only in the special stationary case that the stimulus covariance C s is constrained to have a Toeplitz structure and the mean vector μ s is constrained to be constant with respect to time-shifts.

Since we are observing the neural responses r t given the input s t presented at each time point t, we should arguably optimize our information over this marginal mixture distribution, instead of the single Gaussian distribution optimized in this appendix. Alternately, we could enforce stationarity in our optimized Gaussian process by including Toeplitz constraints on C s in our derivation above. We have had limited success deriving a computationally efficient optimization strategy in either of these cases, but this remains an attractive direction for future research. Meanwhile, the results described in Section 4.2 (with a non-stationary optimized Gaussian stimulus ensemble) remain encouraging.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Lewi, J., Schneider, D.M., Woolley, S.M.N. et al. Automating the design of informative sequences of sensory stimuli. J Comput Neurosci 30, 181–200 (2011).

Download citation


  • Information theory
  • Generalized linear model
  • Birdsong
  • Active learning