Skip to main content
Log in

Consistent estimation of complete neuronal connectivity in large neuronal populations using sparse “shotgun” neuronal activity sampling

  • Published:
Journal of Computational Neuroscience Aims and scope Submit manuscript

Abstract

We investigate the properties of recently proposed “shotgun” sampling approach for the common inputs problem in the functional estimation of neuronal connectivity. We study the asymptotic correctness, the speed of convergence, and the data size requirements of such an approach. We show that the shotgun approach can be expected to allow the inference of complete connectivity matrix in large neuronal populations under some rather general conditions. However, we find that the posterior error of the shotgun connectivity estimator grows quickly with the size of unobserved neuronal populations, the square of average connectivity strength, and the square of observation sparseness. This implies that the shotgun connectivity estimation will require significantly larger amounts of neuronal activity data whenever the number of neurons in observed neuronal populations remains small. We present a numerical approach for solving the shotgun estimation problem in general settings and use it to demonstrate the shotgun connectivity inference in the examples of simulated synfire and weakly coupled cortical neuronal networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Abeles, M. (1991). Corticonics: Cambridge University Press.

  • Bellet, L.R. (2006). Ergodic properties of Markov processes. In Open Quantum Systems II (pp. 1–39). Berlin: Springer.

  • Berk, K.N. (1973). A Central Limit Theorem for m-Dependent Random Variables with Unbounded m. Annals of Probability, 1(2), 352–354.

    Article  Google Scholar 

  • Boyd, S.P. (2004). Convex optimization: Cambridge University Press.

  • Bradley, R.C. (2005). Basic properties of strong mixing conditions. A survey and some open questions. Probability surveys, 2, 107–144.

    Article  Google Scholar 

  • Braitenberg, V., & Schuz, A. (1998). Cortex: statistics and geometry of neuronal connectivity. Berlin: Springer.

    Book  Google Scholar 

  • Brillinger, D. (1988). Maximum likelihood analysis of spike trains of interacting nerve cells. Biological Cyberkinetics, 59, 189–200.

    Article  CAS  Google Scholar 

  • Brillinger, D. (1992). Nerve cell spike train data analysis: a progression of technique. Journal of the American Statistical Association, 87, 260–271.

    Article  Google Scholar 

  • Chornoboy, E., Schramm, L., & Karr, A. (1988). Maximum likelihood identification of neural point process systems. Biological Cybernetics, 59, 265–275.

    Article  CAS  PubMed  Google Scholar 

  • Cotton, R.J., Froudarakis, E., Storer, P., Saggau, P., & Tolias, A.S. (2013). Three-dimensional mapping of microcircuit correlation structure. Frontiers in Neural Circuits, 7, 151.

    Article  PubMed  PubMed Central  Google Scholar 

  • Cossart, R., Aronov, D., & Yuste, R. (2003). Attractor dynamics of network up states in the neocortex. Nature, 423, 283–288.

    Article  CAS  PubMed  Google Scholar 

  • Coulon-Prieur, C., & Doukhan, P. (2000). A triangular central limit theorem under a new weak dependence condition. Stat. Probab. Lett., 27(1), 61–68.

    Article  Google Scholar 

  • Davidson, J. (2006). Asymptotic methods and functional central limit theorems. In T.C. Mills, & K. Patterson (Eds.), Palgrave Handbooks of Econometrics: Palgrave-Macmillan.

  • Dedecker, J., & Merlevede, F. (2002). Necessary and sufficient conditions for the conditional central limit theorem. Annals of Probability, 30, 1044–1081.

    Article  Google Scholar 

  • Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1–38.

    Google Scholar 

  • Djurisic, M., Antic, S., Chen, W.R., & Zecevic, D. (2004). Voltage imaging from dendrites of mitral cells: EPSP attenuation and spike trigger zones. Journal of Neuroscience, 24(30), 6703–6714.

    Article  CAS  PubMed  Google Scholar 

  • Furedi, Z., & Komlos, J. (1981). The eigenvalues of random symmetric matrices. Combinatorica, 1, 233.

    Article  Google Scholar 

  • Doukhan, P. (1994). Mixing: Properties and Examples: Springer. Lect. Notes. Stat. 85.

  • Godsill, S., Doucet, A., & West, M. (2001). Maximum a Posteriori Sequence Estimation Using Monte Carlo Particle Filters. Annals of the Institute of Statistical Mathematics, 53(1), 82–96.

    Article  Google Scholar 

  • Gomez-Urquijo, S.M., Reblet, C., Bueno-Lopez, J.L., & Gutierrez-Ibarluzea, I. (2000). Gabaergic neurons in the rabbit visual cortex: percentage, distribution and cortical projections. Brain Research, 862, 171–9.

    Article  CAS  PubMed  Google Scholar 

  • Grewe, B., Langer, D., Kasper, H., Kampa, B., & Helmchen, F. (2010). High-speed in vivo calcium imaging reveals neuronal network activity with near-millisecond precision. Nature Methods, 399–405.

  • Guillotin-Plantard, N., & Prieur, C. (2010). Central limit theorem for sampled sums of dependent random variables. ESAIM: Probability and Statistics, 14, 299–314.

    Article  Google Scholar 

  • Hairer, M. (2010). “Convergence of Markov processes.” Lecture notes.

  • Hall, P., & Heyde, C.C. (2014). Martingale limit theory and its applications, (p. 320): Academic Press. Chapter 3.

  • Iyer, V., Hoogland, T.M., & Saggau, P. (2006). Fast functional imaging of single neurons using random-access multiphoton (RAMP) microscopy. Journal of Neurophysiology, 95(1), 535– 545.

    Article  PubMed  Google Scholar 

  • Johnson, O. (2001). An Information-Theoretic Central Limit Theorem for Finitely Susceptible FKG Systems. Theory Probab. Appl., 50(2), 214–224.

    Article  Google Scholar 

  • Kantas, N., Doucet, A., Singh, S.S., & Maciejowski, J.H. (2009). An overview of sequential Monte Carlo methods for parameter estimation in general state-space models. In 15th IFAC Symposium on System Identification (SYSID), Saint-Malo, France, 2009 Jul 6, (Vol. 102 p. 117).

  • Keshri, S., Pnevmatikakis, E., Pakman, A., Shababo, B., & Paninski, L. (2013). A shotgun sampling solution for the common input problem in neuronal connectivity inference. arXiv:http://arxiv.org/abs/1309.3724.

  • Klartag, B. (2007). A central limit theorem for convex sets. Inventiones Mathematicae, 168, 91–131.

    Article  Google Scholar 

  • Koch, C. (1999). Biophysics of Computation: Oxford University Press.

  • Kulkarni, J., & Paninski, L. (2007). Common-input models for multiple neural spike-train data. Network: Computation in Neural Systems, 18, 375–407.

    Article  Google Scholar 

  • Lefort, S., Tomm, C., Floyd Sarria, J. -C., & Petersen, C.C.H. (2009). The excitatory neuronal network of the c2 barrel column in mouse primary somatosensory cortex. Neuron, 61, 301–16.

    Article  CAS  PubMed  Google Scholar 

  • Lehmann, E.L. (1999). Elements of large-sample theory. New York: Springer. Chapter 2.8.

    Book  Google Scholar 

  • Mishchenko, Y., Vogelstein, J., & Paninski, L. (2011). A Bayesian approach for inferring neuronal connectivity from calcium fluorescent imaging data. Annals of Applied Statistics, 5, 1229– 61.

    Article  Google Scholar 

  • Mishchenko, Y., & Paninski, L. (2011). Efficient methods for sampling spike trains in networks of coupled neurons. The Annals of Mathematical Statistics, 5(3), 1893–1919.

    Google Scholar 

  • Newman, C. (1984). Asymptotic Independence and Limit Theorems for Positively and Negatively Dependent Random Variables. Lecture Notes-Monograph Series, 127–140.

  • Neumann, M.H. (2013). A central limit theorem for triangular arrays of weakly dependent random variables, with applications in statistics. ESAIM: Probability and Statistics, 17, 120–134.

    Article  Google Scholar 

  • Nguyen, Q.T., Callamaras, N., Hsieh, C., & Parker, I. (2001). Construction of a two-photon microscope for video-rate Ca 2+ imaging. Cell Calcium, 30(6), 383–393.

    Article  CAS  PubMed  Google Scholar 

  • Nykamp, D. (2007). A mathematical framework for inferring connectivity in probabilistic neuronal networks. Mathematical Biosciences, 205, 204–251.

    Article  PubMed  Google Scholar 

  • Nykamp, D.Q. (2005). Revealing pairwise coupling in linear-nonlinear networks. SIAM Journal of Applied Mathematics, 65(6), 2005–2032.

    Article  Google Scholar 

  • Ohki, K., Chung, S., Ch’ng, Y., Kara, P., & Reid, C. (2005). Functional imaging with cellular resolution reveals precise micro-architecture in visual cortex. Nature, 433, 597–603.

    Article  CAS  PubMed  Google Scholar 

  • Paninski, L. (2004). Maximum likelihood estimation of cascade point-process neural encoding models. Network: Computation in Neural Systems, 15, 243–262.

    Article  Google Scholar 

  • Paninski, L., Ahmadian, Y., Ferreira, D., Koyama, S., Rahnama, K., Vidne, M., Vogelstein, J., & Wu, W. (2010). A new look at state-space models for neural data. Journal of Computational Neuroscience, 29, 107–126.

    Article  PubMed  Google Scholar 

  • Paninski, L., Fellows, M., Shoham, S., Hatsopoulos, N., & Donoghue, J. (2004). Superlinear population encoding of dynamic hand trajectory in primary motor cortex. Journal of Neuroscience, 24, 8551–8561.

    Article  CAS  PubMed  Google Scholar 

  • Pillow, J., & Latham, P. (2007). Neural characterization in partially observed populations of spiking neurons. NIPS.

  • Pillow, J., Shlens, J., Paninski, L., Sher, A., Litke, A., Chichilnisky, E., & Simoncelli, E. (2008). Spatiotemporal correlations and visual signaling in a complete neuronal population. Nature, 454, 995–999.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Plesser, H., & Gerstner, W. (2000). Noise in integrate-and-fire neurons: From stochastic input to escape rates. Neural Computation, 12, 367–384.

    Article  CAS  PubMed  Google Scholar 

  • Rabiner, L.R. (1989). A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of the IEEE, 72(2), 257–286.

    Article  Google Scholar 

  • Rasmussen, C.E., & Williams, C.K.I. (2006). Gaussian processes for Machinee Learning. MIT Press: Appendix B.

  • Reddy, G., Kelleher, K., Fink, R., & Saggau, P. (2008a). Three-dimensional random access multiphoton microscopy for functional imaging of neuronal activity. Nature neuroscience, 11, 713–720.

  • Reddy, G., Kelleher, K., Fink, R., & Saggau, P. (2008b). Three-dimensional random access multiphoton microscopy for functional imaging of neuronal activity. Nature Neuroscience, 11(6), 713–720.

  • Rigat, F., de Gunst, M., & van Pelt, J. (2006). Bayesian modelling and analysis of spatio-temporal neuronal networks. Bayesian Analysis, 1, 733–764.

    Article  Google Scholar 

  • Salome, R., Kremer, Y., Dieudonne, S., Leger, J.-F., Krichevsky, O., Wyart, C., Chatenay, D., & Bourdieu, L. (2006). Ultrafast random-access scanning in two-photon microscopy using acousto-optic deflectors. Journal of Neuroscience Methods, 154(1–2), 161–174.

    Article  CAS  PubMed  Google Scholar 

  • Sayer, R.J., Friedlander, M.J., & Redman, S.J. (1990). The time course and amplitude of epsps evoked at synapses between pairs of ca3/ca1 neurons in the hippocampal slice. Journal of Neuroscience, 10, 826–36.

    CAS  PubMed  Google Scholar 

  • Soudry, D., Keshri, S., Stinson, P., Oh, M.-H., Iyengar, G., & Paninski, L. (2015). Efficient “Shotgun” inference of neural connectivity from highly sub-sampled activity data. PLOS Computational Biology, 11, e1004464.

    Article  PubMed  PubMed Central  Google Scholar 

  • Stevenson, I., Rebesco, J., Hatsopoulos, N., Haga, Z., Miller, L., & Koerding, K. (2008a). Inferring network structure from spikes. Statistical Analysis of Neural Data meeting.

  • Stevenson, I.H., Rebesco, J.M., Hatsopoulos, N.G., Haga, Z., Miller, L.E., & Kording, K.P. (2009). Bayesian inference of functional connectivity and network structure from spikes. IEEE Transactions on Neural Systems and Rehabilitation, 17, 203–13.

    Article  Google Scholar 

  • Stevenson, I.H., Rebesco, J.M., Miller, L.E., & Kording, K.P. (2008b). Inferring functional connections between neurons. Current Opinion in Neurobiology, 18, 582–8.

  • Stosiek, C., Garaschuk, O., Holthoff, K., & Konnerth, A. (2003). In vivo two-photon calcium imaging of neuronal networks. Proceedings of The National Academy Of Sciences Of The United States Of America, 100(12), 7319–7324.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Theis, L., Berens, P., Froudarakis, E., Reimer, J., Roman-Roson, M., Baden, T., Euler T., Tolias A.S., & Bethge, M. (2015). Supervised learning sets benchmark for robust spike detection from calcium imaging signals. bioRxiv, 010777.

  • Truccolo, W., Eden, U., Fellows, M., Donoghue, J., & Brown, E. (2005). A point process framework for relating neural spiking activity to spiking history, neural ensemble and extrinsic covariate effects. Journal of Neurophysiology, 93, 1074–1089.

    Article  PubMed  Google Scholar 

  • Tsien, R.Y. (1989). Fluorescent probes of cell signaling. Annual Review of Neuroscience, 12, 227–253.

    Article  CAS  PubMed  Google Scholar 

  • Turaga, S., Buesing, L., Packer, A., Dalgleish, H., Pettit, N., Hausser, M., & Macke, J. (2013). Inferring neural population dynamics from multiple partial recordings of the same neural circuit. NIPS.

  • Varadhan, S.R.S. (2001). Probability theory, volume 7 of Courant Lecture Notes in Mathematics. New York: New York University Courant Institute of Mathematical Sciences. Chapter 6.

    Google Scholar 

  • Vidne, M., Ahmadian, Y., Shlens, J., Pillow, J., Kulkarni, J., Litke, A., Chilchilnisky, E., Simoncelli, E., & Paninski, L. (2012). The impact of common noise on the activity of a large network of retinal ganglion cells. Journal of Computational Neuroscience, 33, 97–121.

    Article  PubMed  Google Scholar 

  • Vidne, M., Kulkarni, J., Ahmadian, Y., Pillow, J., Shlens, J., Chichilnisky, E., Simoncelli, E., & Paninski, L. (2009). Inferring functional connectivity in an ensemble of retinal ganglion cells sharing a common input. COSYNE.

  • Vogelstein, J., Watson, B., Packer, A., Yuste, R., Jedynak, B., & Paninski, L. (2009). Spike inference from calcium imaging using sequential Monte Carlo methods. Biophysical Journal, 97, 636.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Vogelstein, J.T., Packer, A.M., Machado, T.A., Sippy, T., Babadi, B., Yuste, R., & Paninski, L. (2010). Fast nonnegative deconvolution for spike train inference from population calcium imaging. Journal of Neurophysiology, 104, 3691.

    Article  PubMed  PubMed Central  Google Scholar 

  • Wallace, D., zum Alten Borgloh, S., Astori, S., Yang, Y., Bausen, M., K”ugler, S., Palmer, A., Tsien, R., Sprengel, R., Kerr, J., Denk, W., & Hasan, M. (2008). Single-spike detection in vitro and in vivo with a genetic Ca2+ sensor. Nature Methods, 5(9), 797–804.

    Article  CAS  PubMed  Google Scholar 

  • Yatsenko, D., Josi, K., Ecker, A.S., Froudarakis, E., Cotton, R.J., & Tolias, A.S. (2015). Improved Estimation and Interpretation of Correlations in Neural Circuits. PLoS Computational Biology, 11, e1004083.

    Article  PubMed  PubMed Central  Google Scholar 

  • Yuste, R., Konnerth, A., Masters, B., & et al. (2006). Imaging in Neuroscience and Development, A Laboratory Manual.

Download references

Acknowledgments

The author acknowledges the financial support via the TUBITAK ARDEB 1001 research grant number 113E611 (Turkey) and Bilim Akademisi—The Science Academy (Turkey) Young Investigator Award under BAGEP program. The author acknowledges a key discussion with Liam Paninski leading to this work, and Daniel Soudry’s comments on an early version of this manuscript. The author is also thankful to the anonymous reviewers, whose comments led to many critical improvements of the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuriy Mishchenko.

Ethics declarations

Conflict of interests

The author declares that he has no conflict of interest.

Additional information

Action Editor: Rob Kass.

Appendices

Appendix A: Sequential Monte Carlo Expectation Maximization algorithm for the numerical solution of the shotgun connectivity estimation problem in general Markov models of neuronal population activity

The EM algorithm (Dempster et al. 1977) is the standard method of statistical inference in the presence of missing data. Briefly, the EM algorithm produces at least a locally maximum likelihood estimate of the parameters of a model P(X,Y|𝜃) given a set of observations X with the data Y missing, \(\hat \theta = arg\max {\sum }_{Y} P(\theta ,Y| X)\). The EM algorithm produces a sequence of parameter estimates \(\hat \theta ^{q}\) by iteratively maximizing the functions \(Q(\theta |\hat \theta ^{q})\),

$$ Q(\theta|\hat \theta^{q})=E_{P(Y|X,\hat \theta^{q})} [\log P(X,Y,\theta)], $$
(47)

where \(Q(\theta |\hat \theta ^{q})\) at each step is calculated by constructing M samples of the unavailable data Y from \(P(Y|X,\hat \theta ^{q})\) and using the following average,

$$ Q(\theta|\hat \theta)=\frac{1}{M}{\sum}_{k=1}^{M} \log P(X,Y^{k},\theta). $$
(48)

In the case of the shotgun sampling, the sampling step of the EM algorithm can be implemented using the forward-backward algorithm (Rabiner 1989) and the sequential Monte-Carlo method also known as the Particle Filtering (Godsill et al. 2001). In this case, the distribution of the hidden neuronal activities at every observation is modeled by a sample of M hidden neurons’ activity configurations, \({Y^{k}_{t}}\sim P(Y_{t}|X,\mathcal {W})\), each referred to as a “particle”.

In order to produce this sample, it is advantageous to reformulate the sampling problem \(Y_{t}\sim P(Y_{t}|X,\mathcal {W})\) in a more convenient way as such applying to the drawing of a sample of the complete neuronal activity configurations \(\mathcal {X}_{t}\), in such a way that the activity of the parts of the neuronal population observed at time t match with the available observations data X t . In this sense, we view the activity of the entire neuronal population \(\mathcal {X}_{t}\) as the “hidden” state, and the mapping of \(\mathcal {X}_{t}\) onto the subsets of the observed neurons, \(\mathsf {X}: \mathcal {X}_{t} \mapsto X_{t}\), as the observations of that state. In this form, the problem becomes that of sampling the sequence of the hidden states \(\mathcal {X}_{t}\) from a Hidden Markov Model given the observations X t .

This problem now can be efficiently solved using the standard forward-backward algorithm.

The forward-backward algorithm consists of two passes. In the first forward pass, a sequence of samples of the hidden states is produced according to \(P(\mathcal {X}_{t}|X_{1:t},\mathcal {W})\), where X 1:t refers to the collection of all observed neuronal activities up to and including time t. Each sample in that sequence contains M examples of the complete neuronal population activity, \(\mathcal {X}_{t} \sim P(\mathcal {X}_{t}|X_{1:t},\mathcal {W})\) satisfying the constraint \(\mathsf {X}(\mathcal {X}_{t})=X_{t}\), while the entire sequence contains T such samples, t=1…T, where T is the number of the observations, so that \(\{ \mathcal {X}^{k}_{t}, k=1{\ldots } M, t=1{\ldots } T \}\).

Forward pass samples can be constructed iteratively by drawing the initial sample \(\mathcal {X}_{0}\) from a prior distribution \(P(\mathcal {X}_{0})\) and then constructing each next sample according to,

$$ \begin{array}{l} \mathcal{X}_{t} \sim P(\mathcal{X}_{t}|X_{1:t})=\\ \mathcal{Z}^{-1} \sum\limits_{\mathcal{X}_{t-1}} P(X_{t}|\mathcal{X}_{t})P(\mathcal{X}_{t}|\mathcal{X}_{t-1})P(\mathcal{X}_{t-1}|X_{1:t-1}). \end{array} $$
(49)

Here \(\mathcal {Z}\) is a normalization constant to be calculated below and we stopped writing parameter \(\mathcal {W}\) in the probability densities for brevity.

According to Eq. (49), the forward step at each t can be realized by taking the previous sample’s particles \(\mathcal {X}^{k}_{t-1}\sim P(\mathcal {X}_{t-1}|X_{1:t-1})\) and “moving” them according to the transition probabilities

$$ P(\mathcal{X}^{k}_{t-1}\rightarrow \mathcal{X}^{k}_{t})=\mathcal{Z}^{-1}P(X_{t}|\mathcal{X}^{k}_{t})P(\mathcal{X}^{k}_{t}|\mathcal{X}^{k}_{t-1}). $$
(50)

Eq. (50) can be simplified by noting that \(P(X_{t}|\mathcal {X}_{t})\) has the effect of only restricting the moves \(\mathcal {X}^{k}_{t-1}\rightarrow \mathcal {X}^{k}_{t}\) to such that make the activity patterns of the neurons observed in \(\mathcal {X}^{k}_{t}\) match the available observation X t ,

$$ P(X_{t}|\mathcal{X}^{k}_{t})\propto \left\{ \begin{array}{l} 1\ if\ {X^{k}_{t}}=X_{t}\\ 0\ otherwise \end{array} \right. $$
(51)

By using this and taking advantage of the factorization of the probabilities \(P(\mathcal {X}_{t}|\mathcal {X}_{t-1})\) over individual neurons i, \(P(\mathcal {X}_{t}|\mathcal {X}_{t-1})={\prod }_{i} P(\mathcal {X}_{it}|\mathcal {X}_{t-1})\), we obtain the normalization constant \(\mathcal {Z}\) explicitly as,

$$ \mathcal{Z}=E_{\mathcal{X}_{t-1}}[P(X_{t}|\mathcal{X}_{t-1})] =\frac{1}{M}\sum\limits_{k} P(X_{t}|\mathcal{X}^{k}_{t-1}). $$
(52)

With this simplification, we arrive at the final algorithm for the forward step:

  1. (i)

    Select one \(\mathcal {X}^{k}_{t-1}\) from the previous t−1 sample \(\mathcal {X}^{k}_{t-1}\sim P(\mathcal {X}_{t-1}|X_{1:t-1})\) with probability

    $$\begin{array}{l} p(k)=1/M\cdot P(X_{t}|\mathcal{X}^{k}_{t-1})/\mathcal{Z} \\ = P(X_{t}|\mathcal{X}^{k}_{t-1})/{\sum}_{k} P(X_{t}|\mathcal{X}^{k}_{t-1}); \end{array}, $$

    where

    $$P(X_{t}|\mathcal{X}_{t-1})=\sum\limits_{\mathcal{X}_{t}}P(X_{t}|\mathcal{X}_{t})P(\mathcal{X}_{t}|\mathcal{X}_{t-1}). $$
  2. (ii)

    Set in \(\mathcal {X}^{k}_{t}\) the activity of the neurons i observed in observation t as \(\mathcal {X}^{k}_{it}=X_{it}\);

  3. (iii)

    Set in \(\mathcal {X}^{k}_{t}\) the activity of the neurons \(i^{\prime }\) not observed in observation t as \(\mathcal {X}^{k}_{i^{\prime } t}\sim P(\mathcal {X}_{i^{\prime } t}|\mathcal {X}^{k}_{t-1})\).

In the backward pass, the samples \((\mathcal {X}_{t-1},\mathcal {X}_{t})\sim P(\mathcal {X}_{t-1},\mathcal {X}_{t}|X)\) need to be constructed for each t conditional on all observations X={X t ,t=1…T}. These samples can be constructed using the following relationship that we use from (Paninski et al. 2010),

$$ \begin{array}{l} P(\mathcal{X}_{t},\mathcal{X}_{t+1}|X)=P(\mathcal{X}_{t}|X_{1:t})\frac{P(\mathcal{X}_{t+1}|\mathcal{X}_{t})}{P(\mathcal{X}_{t+1}|X_{1:t})}P(\mathcal{X}_{t+1}|X), \end{array} $$
(53)

where \(P(\mathcal {X}_{t+1}|X_{1:t})={\sum }_{\mathcal {X}_{t}} P(\mathcal {X}_{t+1}|\mathcal {X}_{t})P(\mathcal {X}_{t}|X_{1:t})= E_{\mathcal {X}_{t}}[P(\mathcal {X}_{t+1}|\mathcal {X}_{t})]\), the average being over the forward pass sample \(\mathcal {X}^{k}_{t}\sim P(\mathcal {X}_{t}|X_{1:t})\).

According to Eq. (53), the backward step can be constructed by first combining into pairs the forward pass samples for observation t, \(\mathcal {X}^{k}_{t}\sim P(\mathcal {X}_{t}|X_{1:t})\), and the backward pass samples for observation t+1, \(\mathcal {X}^{l}_{t+1} \sim P(\mathcal {X}_{t+1}|X)\), and then weighing these with the weights \(w^{kl}_{t} = P(\mathcal {X}^{l}_{t+1}|\mathcal {X}^{k}_{t})/{\sum }_{k}P(\mathcal {X}^{l}_{t+1}|\mathcal {X}^{k}_{t})\). Thus formed pairs \((\mathcal {X}^{k}_{t},\mathcal {X}^{l}_{t+1})\) are distributed according to \((\mathcal {X}^{k}_{t},\mathcal {X}^{l}_{t+1})\sim P(\mathcal {X}_{t}|X_{1:t})P(\mathcal {X}_{t+1}|X)\), and the expectation value of any functional \(F(\mathcal {X}_{t},\mathcal {X}_{t+1})\) over \(P(\mathcal {X}_{t},\mathcal {X}_{t+1}|X)\) can be calculated by using such pairs as \(E[F]=1/M{\sum }_{kl}F(\mathcal {X}^{k}_{t},\mathcal {X}^{l}_{t+1})w^{kl}_{t}\). In addition, \(P(\mathcal {X}_{t}|X)={\sum }_{\mathcal {X}_{t+1}}P(\mathcal {X}_{t},\mathcal {X}_{t+1}|X)\) and the next backward pass sample for observation t, \(\mathcal {X}^{k}_{t}\sim P(\mathcal {X}_{t}|X)\), can be constructed by drawing with replacement \(\mathcal {X}^{k}_{t}\) from \((\mathcal {X}^{k}_{t},\mathcal {X}^{l}_{t+1})\) with probabilities \(p(k)\propto {\sum }_{l} w^{kl}_{t}\).

Thus, we arrive at the final backward step algorithm as follows:

  1. (i)

    Form M 2 pairs \((\mathcal {X}^{k}_{t},\mathcal {X}^{l}_{t+1})\) using the forward pass sample \(\mathcal {X}^{k}_{t}\sim P(\mathcal {X}_{t}|X_{1:t})\) and the backward pass sample \(\mathcal {X}^{l}_{t+1}\sim P(\mathcal {X}_{t+1}|X)\);

  2. (ii)

    Calculate the weights \(w^{kl}_{t} = P(\mathcal {X}^{l}_{t+1}|\mathcal {X}^{k}_{t})/{\sum }_{k}P(\mathcal {X}^{l}_{t+1}|\mathcal {X}^{k}_{t})\);

  3. (iii)

    As the next backward pass sample \(\mathcal {X}^{l}_{t}\sim P(\mathcal {X}_{t}|X)\) select with replacement \(\mathcal {X}^{k}_{t}\) from the pairs \((\mathcal {X}^{k}_{t},\mathcal {X}^{l}_{t+1})\) with the probabilities \(p(k)=1/M{\sum }_{l} w^{kl}_{t}\);

  4. (iv)

    The expectation value of a functional \(F(\mathcal {X}_{t},\mathcal {X}_{t+1})\), \(E_{P(\mathcal {X}_{t},\mathcal {X}_{t+1}|X)}[F(\mathcal {X}_{t},\mathcal {X}_{t+1})]\), is given by \(E[F]=1/M{\sum }_{kl}F(\mathcal {X}^{k}_{t},\mathcal {X}^{l}_{t+1})w^{kl}_{t}\).

In the optimization step of the EM algorithm, we maximize with respect to \(\mathcal {W}\) the following function,

$$ \begin{array}{l} Q(\mathcal{W}|\hat{\mathcal{W}})=E_{P(Y|X,\hat{\mathcal{W}})} [\log P(X,Y,\mathcal{W})]\\ =\log P(\mathcal{W}) + E_{P(\mathcal{X}_{0}|X,\hat{\mathcal{W}})}[\log P(\mathcal{X}_{0})]\\ +\sum\limits_{t} E_{P(\mathcal{X}_{t-1},\mathcal{X}_{t}|X,\hat{\mathcal{W}})}[\log P(\mathcal{X}_{t}|\mathcal{X}_{t-1},\mathcal{W})]. \end{array} $$
(54)

In order to calculate \(Q(\mathcal {W}|\hat {\mathcal {W}})\) it is sufficient to know the samples \((\mathcal {X}_{t}^{k},\mathcal {X}_{t+1}^{l})\sim P(\mathcal {X}_{t}|X_{1:t})P(\mathcal {X}_{t+1}|X)\) and the weights \(w^{kl}_{t}\). Moreover, \(Q(\mathcal {W}|\hat {\mathcal {W}})\) can be split into a sum over the rows of the matrix \(\mathcal {W}\), W i , as \(Q(\mathcal {W}|\hat {\mathcal {W}})={\sum }_{i} Q(W_{i}|\hat {\mathcal {W}})\), with \(Q(W_{i}|\hat {\mathcal {W}})\) given by

$$ \begin{array}{l} Q(W_{i}|\hat{\mathcal{W}})=\log P(W_{i}) + E_{P(\mathcal{X}_{0}|X,\hat{\mathcal{W}})}[\log P(\mathcal{X}_{i0})]\\ +\sum\limits_{t} E_{P(\mathcal{X}_{t-1},\mathcal{X}_{t}|X,\hat{\mathcal{W}})}[\log P(\mathcal{X}_{it}|\mathcal{X}_{t-1},W_{i})]. \end{array} $$
(55)

Thus, the optimization of Eq. (54) can be solved for each row i independently, reducing the complexity of the optimization problem from quadratic in the number of neurons N to linear. Moreover, inhomogeneous Poisson point-process models of neuronal activity with log-concave rate functions such as the exponential \(f(.)=\exp (.)\) result in \(Q(W_{i}|\hat {\mathcal {W}})\) that are convex, which allows their numerical optimization to be solved efficiently for very large N, using the standard gradient descent methods (Boyd 2004).

Appendix B: Calculation of the partial observations log-likelihood in the linear neuronal activity model

In this appendix we calculate the integral

$$ \begin{array}{l} P(Z_{t},X_{t}|\hat W)\propto\\ \int dY_{t} \exp\left(-(Z_{t}-W\mathcal{X}_{t})^{2}/2-\mathcal{X}_{t}^{T}C^{-1}\mathcal{X}_{t}/2\right), \end{array} $$
(56)

of the model (18), where the input neuronal activities are distributed according to a correlated Gaussian distribution with zero mean and the covariance matrix C,

$$P(\mathcal{X}_{t})\propto \exp(-\mathcal{X}_{t}^{T}C^{-1}\mathcal{X}_{t}/2), $$

and the integration is performed over the part of \(\mathcal {X}_{t}\), Y t , that is not observed during observation t. The part of \(\mathcal {X}_{t}\) observed during the observation t, respectively, is held fixed and equals X t . W is a single row-vector from the full connectivity matrix \(\mathcal {W}\) corresponding to the input connection weights of one “output” neuron.

The calculations in Eq. (56) can be simplified if we represent the integral in an invariant form by introducing δ-functions that will restrict the integration over \(\mathcal {X}_{t}\) to the hyperplane defined by fixing X t , namely,

$$ \begin{array}{l} \int dY_{t} \exp\left(-(Z_{t}-W\mathcal{X}_{t})^{2}/2-\mathcal{X}_{t}^{T}C^{-1}\mathcal{X}_{t}/2\right) = \\ \int d\mathcal{X}_{t} \prod\limits_{i=1}^{i=m}\delta(\mathcal{X}_{it}-X_{it}) \\ \times \exp\left(-(Z_{t}-W\mathcal{X}_{t})^{2}/2-\mathcal{X}_{t}^{T}C^{-1}\mathcal{X}_{t}/2\right). \end{array} $$
(57)

Here m is the number of the observed neuronal inputs and w.l.o.g. we assumed that the observed inputs X t comprise the first m elements of \(\mathcal {X}_{t}\). We now replace the δ-functions in Eq. (57) with their Fourier representation, \(\delta (x)=\frac 1{2\pi }\int dk e^{-ikx}\), yielding

$$ \begin{array}{l} \int d\mathcal{X}_{t} dK \prod\limits_{i=m+1}^{i=N}\delta(K_{i}) \exp\left(-(Z_{t}-W\mathcal{X}_{t})^{2}/2 \right.\\ \left. -iK^{T}(\mathcal{X}_{t}-\bar{X}_{t})-\mathcal{X}_{t}^{T}C^{-1}\mathcal{X}_{t}/2\right), \end{array} $$
(58)

where \(\bar {X}_{t}\) is a full-size column-vector of neuronal inputs, with the first m elements equal to X t and the rest of the elements zero (these do not affect the integral since K i =0 for i>m). In Eq. (58), the integral over \(\mathcal {X}_{t}\) can be taken explicitly as a Gaussian, resulting in

$$ \begin{array}{l} \int dK \prod\limits_{i=m+1}^{i=N}\delta(K_{i}) \exp(i\bar{X}_{t}^{T}K)\sqrt{\det{\Gamma}}\\ \times \exp\left(-{Z_{t}^{2}}/2+(Z_{t}W-iK^{T}){\Gamma}(Z_{t}W^{T}-iK)/2\right), \end{array} $$
(59)

where the matrix Γ is identified from the part of the argument of the exponential in Eq. (58) that is quadratic in \(\mathcal {X}_{t}\), Γ−1=(C −1+W T W). We expand the second term under the exponential in Eq. (59) as

$$ \begin{array}{l} \int dK \prod\limits_{i=m+1}^{i=N}\delta(K_{i}) \sqrt{\det{\Gamma}}\\ \times \exp\left(-{Z_{t}^{2}}/2+{Z_{t}^{2}}W{\Gamma} W^{T}/2 \right.\\ \left.+i\bar{X}_{t}^{T}K-iZ_{t}W{\Gamma} K - K^{T}{\Gamma} K/2\right). \end{array} $$
(60)

The δ-functions in Eq. (60) subsequently restrict the integration over K to only such values where K i =0 for all m<iN. Thus, we rewrite this integration as

$$ \begin{array}{l} \int dK_{X} \sqrt{\det{\Gamma}}\\ \times \exp\left(-{Z_{t}^{2}}/2+{Z_{t}^{2}}W{\Gamma} W^{T}/2 \right.\\ \left.+i(\bar{X}_{t}^{T}-iZ_{t}W{\Gamma})_{X} K_{X} - {K_{X}^{T}}{\Gamma}_{X} K_{X}/2\right), \end{array} $$
(61)

where the subscript X means restriction to the first m elements, as contained in the observed set of neuronal inputs X t . Thus obtained integration over K X is again Gaussian, and so we can perform it explicitly producing

$$ \begin{array}{l} \int dY_{t} \exp(-(Z_{t}-W\mathcal{X}_{t})^{2}/2-\mathcal{X}_{t}^{T}C^{-1}\mathcal{X}_{t}/2) \propto \\ \sqrt{\frac{\det{\Gamma}}{\det{\Gamma}_{X}}} \exp\left(-{Z_{t}^{2}}/2+{Z_{t}^{2}}W{\Gamma} W^{T}/2 \right.\\ \left. -(\bar{X}_{t}^{T}-Z_{t}W{\Gamma})_{X}{\Gamma}_{X}^{-1}(\bar{X}_{t}-Z_{t}{\Gamma} W^{T})_{X}/2\right). \end{array} $$
(62)

As a simple check, we take C=I (uncorrelated inputs) and obtain by repeatedly using Woodbury lemma,

$$ \begin{array}{l} {\Gamma} = (I+W^{T}W)^{-1}=I-W^{T}W/(1+W^{2}) \\ {\Gamma}_{X} = I-{W_{X}^{T}}W_{X}/(1+W^{2}) \\ {\Gamma}_{X}^{-1}=I+{W_{X}^{T}}W_{X}/(1+{W_{Y}^{2}}) \\ (W{\Gamma}) = W/(1+W^{2}) \\ (W{\Gamma})_{X}=W_{X}/(1+W^{2}) \end{array} $$
(63)

where W X and W Y are the restrictions of W to the subsets of the neuronal inputs X t and Y t , respectively, and W 2=W W T. For \(\det {\Gamma }\) and \(\det {\Gamma }_{X}\), we obtain \(\det {\Gamma }=(1+W^{2})^{-1}\) and \(\det {\Gamma }_{X}=(1+{W_{Y}^{2}})/(1+W^{2})\), so that,

$$\begin{array}{l} \det{\Gamma}/\det{\Gamma}_{X}=1/(1+{W_{Y}^{2}}). \end{array} $$

Similarly,

$$ \begin{array}{l} -{Z_{t}^{2}}+{Z_{t}^{2}}W{\Gamma} W^{T}=-{Z_{t}^{2}}/(1+W^{2})\\ {X_{t}^{T}}{\Gamma}_{X}^{-1}X_{t}={X_{t}^{2}}+(W_{X}X_{t})^{2}/(1+{W_{Y}^{2}})\\ Z_{t}(W{\Gamma})_{X}{\Gamma}_{X}^{-1}X_{t}=Z_{t}(W_{X}X_{t})/(1+{W_{Y}^{2}}) \\ {Z_{t}^{2}}(W{\Gamma})_{X}{\Gamma}_{X}^{-1}({\Gamma} W^{T})_{X}=-{Z_{t}^{2}}{W_{X}^{2}}/((1+W^{2})(1+{W_{Y}^{2}})) \end{array} $$
(64)

Bringing everything in Eq. (64) together, we obtain for the case C=I,

$$ \begin{array}{l} \int dY_{t} \exp(-(Z_{t}-W\mathcal{X}_{t})^{2}/2-\mathcal{X}_{t}^{T}C^{-1}\mathcal{X}_{t}/2) \\ =(1+{W_{Y}^{2}})^{-1/2} \exp\left(-\frac12\left[\frac{{Z_{t}^{2}}}{1+{W_{Y}^{2}}}+\frac{2Z_{t}W_{X}X_{t}}{1+{W_{Y}^{2}}}\right.\right.\\ \left.\left.\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad\quad-\frac{(W_{X}X_{t})^{2}}{1+{W_{Y}^{2}}}\right]-\frac{{X_{t}^{2}}}{2}\right). \end{array} $$
(65)

The correctness of the expression (65) can be verified by direct integration of the original integral using C=I, which is relatively simple.

For the case of general C, similarly by repeated use of Woodbury lemma we obtain

$$ \begin{array}{l} {\Gamma} =C-C\frac{W^{T}W}{1+WCW^{T}}C \\ {\Gamma}_{X} = C_{XX}-C_{X*}\frac{W^{T}W}{1+WCW^{T}}C_{*X} \\ {\Gamma}_{X}^{-1}=C_{XX}^{-1}+C_{XX}^{-1}\frac{C_{X*}W^{T}WC_{*X}}{1+B^{2}}C_{XX}^{-1} \\ (W{\Gamma}) = \frac{WC}{(1+WCW^{T})} \\ (W{\Gamma})_{X}=\frac{WC_{*X}}{(1+WCW^{T})} \end{array} $$
(66)

where

$$B^{2}=WCW^{T}-WC_{*X}C_{XX}^{-1}C_{X*}W^{T}, $$

and C X X is the square block of the full covariance matrix C corresponding to the observed inputs X t , while C X and C X are the rectangular blocks of the full covariance matrix containing all the rows or the columns corresponding to the observed inputs X t . Then, for the determinant factor we obtain,

$$\det{\Gamma} = \det C/(1+WCW^{T}) $$

and

$$\det{\Gamma}_{X} = \det C_{XX}(1+B^{2})/(1+WCW^{T}). $$

Consequently,

$$ \sqrt{\frac{\det{\Gamma}}{\det{\Gamma}_{X}}}= \sqrt{\frac{\det C}{\det C_{XX}}}\cdot(1+B^{2})^{-1/2}. $$
(67)

By representing

$$C=\left[ \begin{array}{cc} C_{XX} & C_{XY} \\ C_{YX} & C_{YY} \end{array} \right] $$

we can reorder the quantity B 2 as

$$B^{2}=W_{Y}(C_{YY}-C_{YX}C_{XX}^{-1}C_{XY}){W_{Y}^{T}}. $$

We see then that B 2 plays the role here of \({W_{Y}^{2}}\) in Eq. (65). We further proceed to simplify the terms under the exponent in Eq. (62) using Eq.(66). This highly tedious calculation results in the following,

$$ \begin{array}{l} \int dY_{t} \exp(-(Z_{t}-W\mathcal{X}_{t})^{2}/2-\mathcal{X}_{t}^{T}C^{-1}\mathcal{X}_{t}/2) \propto \\ \sqrt{\frac{\det C}{\det C_{XX}}}(1+B^{2})^{-1/2} \\ \times\exp\left({\vphantom{\left.+\frac{Z_{t}WC_{*X}C_{XX}^{-1}X_{t}}{1+B^{2}} -\frac12\frac{(WC_{*X}C_{XX}^{-1}X_{t})^{2}}{1+B^{2}}\right)}}-\frac{1}{1+B^{2}}\frac{{Z_{t}^{2}}}{2} -\frac{1}{2}{X_{t}^{T}}C_{XX}^{-1}X_{t}\right.\\ \left.+\frac{Z_{t}WC_{*X}C_{XX}^{-1}X_{t}}{1+B^{2}} -\frac12\frac{(WC_{*X}C_{XX}^{-1}X_{t})^{2}}{1+B^{2}}\right). \end{array} $$
(68)

As a consistency check, we verify that Eq. (68) reduces to Eq. (65) if C=I. Finally, we rewrite Eq. (68) more concisely as

$$ \begin{array}{l} \int dY_{t} \exp(-(Z_{t}-W\mathcal{X}_{t})^{2}/2-\mathcal{X}_{t}^{T}C^{-1}\mathcal{X}_{t}/2) \propto \\ \sqrt{\frac{\det C}{\det C_{XX}}}(1+B^{2})^{-1/2} \\ \times\exp\left(-\frac12\frac{(Z_{t}-WC_{*X}C_{XX}^{-1}X_{t})^{2}}{1+B^{2}} - \frac12{X_{t}^{T}}C_{XX}^{-1}X_{t}\right) \end{array} $$
(69)

Appendix C: Additional proofs

In this appendix we present the complete proofs of some of the proposition found in the main text.

Theorem 1 (restated)

Let \(P(\mathcal {X}|\mathcal {W})\) be a statistical model of neuronal population activity \(\mathcal {X}=\{\mathcal {X}_{t},t=1,2,\ldots \}\) and let S={S t ,t=1,2,…}, S∼P(S), be a series of partial observations of that model’s activity over subpopulations of neurons S t . Let \(\mathcal {X}\) and S jointly define a stationary and ergodic stochastic process and assume that the classical MLE regularity conditions hold for the model \(P(\mathcal {X}|\mathcal {W})\) , namely:

Assume further that the model \(P(\mathcal {X}|\mathcal {W})\) is uniquely identified in the sense of Definition 1 by a set of distributions \(\mathbf {P}(\mathbf {S})=\{ P(X_{t:t+k|S_{1:k+1}}|\,\mathcal {W})\) , S 1:k+1 S} for some S. Then, for any model of the partial observations P(S) such that the support \(\mathbf {S}^{\prime }=\{S_{t:t+k}:P(S_{t:t+k})>0\}\) completely covers S in the sense of Definition 2, the ML estimator

$$ \begin{array}{l} \hat{\mathcal{W}}_{T}(\mathcal{X},S)=\underset{\hat{\mathcal{W}}}{\arg\max} L(\hat{\mathcal{W}}|\mathcal{X},S;T), \end{array} $$
(70)

where

$$ \begin{array}{l} L(\hat{\mathcal{W}}|\mathcal{X},S;T)=\\ \sum\limits_{t=1}^{t=T}\log P(X_{t:t+k|S_{t:t+k}},S_{t:t+k}|\hat{\mathcal{W}}), \end{array} $$
(71)

is consistent.

Proof

Under the conditions of the theorem, consider the average log-likelihood function

$$ \begin{array}{l} l_{T}(\hat{\mathcal{W}}|\mathcal{X},S)=\\ \frac{1}{T}\sum\limits_{t=1}^{t=T}\log P(X_{t:t+k|S_{t:t+k}},S_{t:t+k}|\hat{\mathcal{W}}). \end{array} $$
(72)

By the assumption of ergodicity, in the limit \(T\rightarrow \infty \) almost surely

$$ \begin{array}{l} l_{T}(\hat{\mathcal{W}}|\mathcal{X},S)\rightarrow E_{P(X_{t:t+k|S_{t:t+k}},S_{t:t+k}|{\mathcal{W}})} \\ \left\{\log P(X_{t:t+k|S_{t:t+k}},S_{t:t+k}|\hat{\mathcal{W}}) \right\}, \end{array} $$
(73)

where \(P(X_{t:t+k|S_{t:t+k}},S_{t:t+k}|{\mathcal {W}})\) is the stationary distribution of the model under the true parameter \(\mathcal {W}\). By construction, the tuples S t:t+k and \(\mathcal {X}_{t:t+k}\) are statistically independent, therefore, we have in the limit \(T\rightarrow \infty \),

$$ \begin{array}{l} l_{\infty}(\hat{\mathcal{W}})= E_{P(S_{t:t+k})}\left\{E_{P(X_{t:t+k|S_{t:t+k}}|\mathcal{W})}\right.\\ \left. \left[\log P(X_{t:t+k|S_{t:t+k}}|\hat{\mathcal{W}})\right]\right\} + const. \end{array} $$
(74)

By the assumption of the coverage of the identifying set P(S) by the support of P(S t:t+k ) and Gibbs inequality, \(l_{\infty }(\hat {\mathcal {W}})\) achieves maximum only and only when \(\hat {\mathcal {W}}=\mathcal {W}\), providing for the identifiability condition of the MLE. Together with the regularity conditions (A1)-(A3), then, the series of estimates \(\hat {\mathcal {W}}_{T}=\arg \max l_{T}(\hat {\mathcal {W}})\) converges in probability to \(\hat {\mathcal {W}}=\) \(\arg \max l_{\infty }(\hat {\mathcal {W}})=\mathcal {W}\) as \(T\rightarrow \infty \). □

Lemma 1 (restated)

The expected log-likelihood function of model ( 18 ) is

$$ \begin{array}{l} l(\hat W_{i},\hat {\Sigma}) = \\ -1/2E\left\{ \frac{1+W_{i}{\Sigma} {W^{T}_{i}}-2\hat W_{i}\mathcal{A}_{X_{k}} {W^{T}_{i}}+\hat W_{i} \mathcal{A}^{\prime}_{X_{k}} \hat {W^{T}_{i}}}{1+B^{2}_{ik}} \right.\\ \left.+\log(1+ B^{2}_{ik}){\vphantom{\left\{ \frac{1+W_{i}{\Sigma} {W^{T}_{i}}-2\hat W_{i}\mathcal{A}_{X_{k}} {W^{T}_{i}}+\hat W_{i} \mathcal{A}^{\prime}_{X_{k}} \hat {W^{T}_{i}}}{1+B^{2}_{ik}} \right.}}\right\}\\ -1/2 E\left\{Tr[{\Sigma}_{X_kX_K}\hat{\Sigma}^{-1}_{X_kX_k}]+\log\det\hat{\Sigma}_{X_{k}X_{k}}\right\},\\ \end{array} $$
(75)

where the subscript-notation in Σ refers to the blocks of Σ corresponding to the neuronal inputs identified by X k and Y k , with * referring to all row or column elements. \(B^{2}_{ik}\) , \(\mathcal {A}_{X_{k}}\) and \(\mathcal {A}^{\prime }_{X_{k}}\) are

$$ \begin{array}{l} \mathcal{A}_{X_{k}}=\hat{\Sigma}_{*X_k}\hat{\Sigma}^{-1}_{X_kX_k}{\Sigma}_{X_k*} \\ \mathcal{A}^{\prime}_{X_{k}}=\hat{\Sigma}_{*X_k}\hat{\Sigma}^{- 1}_{X_kX_k}{\Sigma}_{X_kX_k}\hat{\Sigma}^{- 1}_{X_kX_k}\hat{\Sigma}_{X_k*} \\ B_{ik}^{2}=\hat W_{i} (\hat{\Sigma} - \hat {\Sigma}_{*X_{k}}\hat{\Sigma}_{X_{k}X_{k}}^{-1}\hat{\Sigma}_{X_{k}*})\hat {W^{T}_{i}} \end{array} $$
(76)

W i and Σ are the true connection weights and the covariance matrix, respectively, and the average is over all different subsets of observed neurons X k .

Proof

Consider the average log-likelihood function given the T i realizations Z i ={Z i k ,k = 1…T i } and X={X k ,k=1…T i } from model (18), marginal over the missing data Y={Y k ,k=1…T i },

$$ l(W_{i},{\Sigma}|Z_{i},X)=\frac{1}{T_{i}} \log P(Z_{i},X|W_{i},{\Sigma}), $$
(77)

where \(P(Z_{i},X|W_{i},{\Sigma })=\int dY P(Z_{i},X,Y|W_{i},{\Sigma })\). Note that Z i and X in Eq. (77) are the collections of T i realizations and so the RHS of Eq. (77) is dependent on T i in this manner. Also note that Z i and X are the observed data in this setting, and Y is the missing data to be integrated out.

When the number of realizations T i is large, by the law of large numbers the RHS of Eq. (77) can be seen to converge in probability to the expected value of \(\log P(Z_{ik},X_{k}|W_{i},{\Sigma })\) under the true distribution of the inputs and outputs P(Z i k ,X k ),

$$ \begin{array}{l} l(\hat W_{i},\hat {\Sigma}|Z_{i},X)=\frac{1}{T_{i}} \log P(Z_{i},X|\hat W_{i},\hat {\Sigma})\\ =\frac{1}{T_{i}}\sum\limits_{k=1}^{k=T_{i}} \log P(Z_{ik},X_{k}|\hat W_{i},\hat {\Sigma})\\ \rightarrow E_{P(Z_{ik},X_{k})}[ \log P(Z_{ik},X_{k}|\hat W_{i},\hat{\Sigma})]. \end{array} $$
(78)

In the last line of Eq. (78) we recognize the expected log-likelihood function,

$$ \begin{array}{l} l(\hat W_{i},\hat {\Sigma})= \\ E[\log \int dY_{k} P(Z_{ik},X_{k},Y_{k}|\hat W_{i},\hat {\Sigma})], \end{array} $$
(79)

where the expectation, again, is with respect to the true density of the observed input and output variables, X k and Z i k , and we used

$$ \begin{array}{l} P(Z_{ik},X_{k}|\hat W_{i},\hat {\Sigma})=\\ \int dY_{k} P(Z_{ik},X_{k},Y_{k}|\hat W_{i},\hat {\Sigma}). \end{array} $$
(80)

We see now that it is necessary to calculate

$$ \begin{array}{l} P(Z_{ik},X_{k}|\hat W_{i},\hat {\Sigma}) = \\ \int dY_{k} \exp \left(- (Z_{ik}- \hat W_{iX_{k}}X_{k}-\hat W_{iY_{k}}Y_{k})^{2}/2\right. \\ \left. - \mathcal{X}_{k}^{T}\hat {\Sigma}^{-1}\mathcal{X}_{k}/2 + const\right), \end{array} $$
(81)

where \(\mathcal {X}_{k}\) is the vector of the complete input activities formed by suitably combining X k and Y k , that is, \(\mathcal {X}_{k}=[X_{k};Y_{k}]\). The integral in Eq. (81) can be calculated explicitly, although the respective calculation is lengthy and is presented fully in Appendix B. The result of that calculation is

$$ \begin{array}{l} \log P(Z_{ik},X_{k}|\hat W_{i},\hat {\Sigma}) = \\ -1/2(1+B_{ik}^{2})^{-1}(Z_{ik}-\hat W_{i}\hat {\Sigma}_{*X_{k}}\hat {\Sigma}_{X_{k}X_{k}}^{-1}X_{k})^{2} \\ -1/2\log(1+B_{ik}^{2}) - {X_{k}^{T}}\hat {\Sigma}_{X_{k}X_{k}}^{-1}X_{k}/2\\ -1/2\log\det \hat {\Sigma}_{X_{k}X_{k}}+ const, \end{array} $$
(82)

where the scalars \(B_{ik}^{2}\) are defined by

$$\begin{array}{rl} B_{ik}^{2}&=\hat W_{i}\hat {\Sigma} \hat {W_{i}^{T}}-\hat W_{i}\hat {\Sigma}_{*X_{k}}\hat{\Sigma}_{X_{k}X_{k}}^{-1}\hat{\Sigma}_{X_{k}*}\hat {W^{T}_{i}} \\ &=\hat W_{iY_{k}}(\hat {\Sigma}_{Y_{k}Y_{k}}-\hat {\Sigma}_{Y_{k}X_{k}}\hat {\Sigma}_{X_{k}X_{k}}^{-1}\hat {\Sigma}_{X_{k}Y_{k}})\hat W_{iY_{k}}^{T}, \end{array} $$

and the subscripted notation in Σ refers to the blocks of Σ corresponding to the neuronal inputs identified by X k and Y k . For example, \({\Sigma }_{X_{k}X_{k}}\) is the submatrix of Σ composed of all elements of Σ located at the intersection of the rows and columns identified by X k . Similarly, \({\Sigma }_{*X_{k}}\) is the rectangular submatrix of Σ containing all the columns corresponding to the observed inputs X k , and \({\Sigma }_{X_{k}*}\) is a similar rectangular submatrix of all the X k -rows of Σ.

Using Eq. (82), we can now obtain the final expression for the expected log-likelihood \(l(\hat W_{i},\hat {\Sigma })\),

$$ \begin{array}{l} l(\hat W_{i},\hat {\Sigma}) = \\ -1/2E\left[\frac{(Z_{ik}-\hat W_{i}\hat {\Sigma}_{*X_{k}}\hat {\Sigma}_{X_{k}X_{k}}^{-1}X_{k})^{2}}{1+B_{ik}^{2}}+ \log(1+B_{ik}^{2}) \right.\\ \left.{\vphantom{\left[\frac{(Z_{ik}-\hat W_{i}\hat {\Sigma}_{*X_{k}}\hat {\Sigma}_{X_{k}X_{k}}^{-1}X_{k})^{2}}{1+B_{ik}^{2}}+ \log(1+B_{ik}^{2}) \right.}}+{X_{k}^{T}}\hat {\Sigma}^{-1}_{X_{k}X_{k}}X_{k} + \log\det\hat {\Sigma}_{X_{k}X_{k}}\right], \end{array} $$
(83)

where the expectation is again under the true distribution of the observed inputs and outputs. We now take the average in Eq. (83) over all X k where the set of neurons contained in X k is the same. This leads to the following expression,

$$ \begin{array}{l} l(\hat W_{i},\hat {\Sigma}) = \\ -1/2E\left\{ \frac{1+W_{i}{\Sigma} {W^{T}_{i}}-2\hat W_{i}\mathcal{A}_{X_{k}} {W^{T}_{i}}+\hat W_{i} \mathcal{A}^{\prime}_{X_{k}} \hat {W^{T}_{i}}}{1+B^{2}_{ik}}\right. \\ \left.{\vphantom{\left\{ \frac{1+W_{i}{\Sigma} {W^{T}_{i}}-2\hat W_{i}\mathcal{A}_{X_{k}} {W^{T}_{i}}+\hat W_{i} \mathcal{A}^{\prime}_{X_{k}} \hat {W^{T}_{i}}}{1+B^{2}_{ik}}\right.}}+\log(1+ B^{2}_{ik})\right\}\\ -1/2 E\left\{Tr[{\Sigma}_{X_{k}X_{k}}\hat {\Sigma}_{X_{k}X_{k}}^{-1}]+\log\det\hat{\Sigma}_{X_{k}X_{k}}\right\},\\ \end{array} $$
(84)

where \(\mathcal {A}\) and \(\mathcal {A}^{\prime }\) are

$$ \begin{array}{l} \mathcal{A}_{X_{k}}=\hat{\Sigma}_{*X_{k}}\hat{\Sigma}_{X_{k}X_{k}}^{-1}{\Sigma}_{X_{k}*} \\ \mathcal{A}^{\prime}_{X_{k}}=\hat{\Sigma}_{*X_{k}}\hat{\Sigma}_{X_{k}X_{k}}^{-1} {\Sigma}_{X_{k}X_{k}}\hat{\Sigma}_{X_{k}X_{k}}^{-1}\hat{\Sigma}_{X_{k}*}, \\ \end{array} $$
(85)

and W i and Σ are the true connection weights and the true covariance matrix parameters, respectively. The remaining average in Eq. (84) is over all different subsets of the observed inputs X k . □

Theorem 3 (restated)

Consider a family of general “network type” Markov models of neuronal population activity characterized by a N×N connectivity matrix \(\mathcal {W}\) and a transition probability density

$$ P(\mathcal{X}_{t}|\mathcal{X}_{t-1};\mathcal{W})= \prod\limits_{i=1}^{i=N} P(\mathcal{X}_{it}|W_{i}\mathcal{X}_{t-1}), $$
(86)

where W i is the i th row of \(\mathcal {W}\) and \(N=dim(\mathcal {X}_{t})\) . Let model ( 86 ) define an ergodic stochastic process and let \(\log P(\mathcal {X}_{t}|\mathcal {X}_{t-1};\mathcal {W})\) be L 1 integrable under the stationary distribution of that process. Also, let \(l_{T,N}(\mathcal {W}|\mathcal {X})\) be the average log-likelihood function for the realizations of the neuronal activity patterns \(\mathcal {X}=\{ \mathcal {X}_{t},t=1\ldots T\}\) in model ( 86 ),

$$ \begin{array}{l} l_{T,N}(\mathcal{W}|\mathcal{X})=\frac{1}{NT}\sum\limits_{t=1}^{t=T}\sum\limits_{i=1}^{i=N} \log P(\mathcal{X}_{it}|\mathcal{X}_{t-1};\mathcal{W}). \end{array} $$
(87)

In that case, if the sums

$$ \mathcal{J}_{it}=\sum\limits_{j=1}^{j=N}w_{ij}\mathcal{X}_{j,t-1}\rightarrow \mathcal{N}(m_{i},{\sigma_{i}^{2}}) $$
(88)

in distribution as \(N\rightarrow \infty \) for tuples \(\mathcal {X}_{t-1}\) from \(P(\mathcal {X}_{t-1})\) and some m i and σ i , possibly functions of N (the Central Limit Theorem), then the set of all triple-distributions \(P(\mathcal {X}_{i,t+1},\mathcal {X}_{jt},\mathcal {X}_{kt})\) is uniquely identifying for model ( 86 ) in the limit \(N\rightarrow \infty \) and, furthermore, \(l_{T,N}(\mathcal {W}|\mathcal {X})\rightarrow l_{\infty }(\mathcal {W})\) almost surely as \(T,N\rightarrow \infty \) , where

$$ \begin{array}{l} l_{\infty}(\mathcal{W})=\frac{1}{N}\sum\limits_{i=1}^{i=N} \int d\mathcal{X}_{i}d\mathcal{J}_{i} \frac{1}{(2\pi W_{i}{\Sigma}(\mathcal{X}_{i}){W_{i}^{T}})^{1/2}} \times \\ \log P(\mathcal{X}_{i}|\mathcal{J}_{i}+W_{i}\mu(\mathcal{X}_{i}))e^{-\mathcal{J}_{i}^{2}/(2W_{i}{\Sigma}(\mathcal{X}_{i}){W_{i}^{T}})} \end{array} $$
(89)

and \(\mu (\mathcal {X}_{i})=E[\mathcal {X}_{t}|\mathcal {X}_{i,t+1}=\mathcal {X}_{i}]\) and \({\Sigma }(\mathcal {X}_{i})=\,\text {cov}(\mathcal {X}_{t}|\,\mathcal {X}_{i,t+1}=\mathcal {X}_{i})\).

Proof

Consider the average log-likelihood function \(l_{T,N}(\mathcal {W}|\mathcal {X})\) given by Eq. (87). By the Ergodic theorem, each i-term in Eq. (87) converges almost surely as \(T\rightarrow \infty \) to

$$ \begin{array}{l} \frac{1}{T}\sum\limits_{t=1}^{t=T}\log P(\mathcal{X}_{it}|\mathcal{J}_{it}) \rightarrow \\ \int d\mathcal{X}_{i} d\mathcal{J}_{i} \log P(\mathcal{X}_{it}|\mathcal{J}_{it}) P(\mathcal{X}_{i},\mathcal{J}_{i}), \end{array} $$
(90)

where \(P(\mathcal {X}_{i},\mathcal {J}_{i})\) is the joint distribution of \(\mathcal {J}_{it}\) and \(\mathcal {X}_{it}\) given the stationary distribution \(P(\mathcal {X}_{t})\). We rewrite \(P(\mathcal {X}_{i},\mathcal {J}_{i})=P(\mathcal {J}_{i}|\mathcal {X}_{i})P(\mathcal {X}_{i})\) and take advantage of the assumption of the validity of the Central Limit Theorem for the sums \(\mathcal {J}_{it}\), by which \(P(\mathcal {J}_{i}|\mathcal {X}_{i})\) approaches the Normal distribution as \(N\rightarrow \infty \) with the mean \(m_{i}=W_{i}\mu (\mathcal {X}_{i})\) and the variance \({\sigma _{i}^{2}}=W_{i}{\Sigma }(\mathcal {X}_{i}){W_{i}^{T}}\). Then, if first T and then N tends to infinity, \(l_{T,N}(\mathcal {W}|\mathcal {X})\) tends to

$$ \begin{array}{l} l_{T,N}(\mathcal{W}|\mathcal{X})\rightarrow\frac{1}{N}\sum\limits_{i=1}^{i=N} \int d\mathcal{X}_{i}d\mathcal{J}_{i} \frac{1}{(2\pi W_{i}{\Sigma}(\mathcal{X}_{i}){W_{i}^{T}})^{1/2}} \\ \times\log P(\mathcal{X}_{i}|\mathcal{J}_{i})e^{-(\mathcal{J}_{i}-m_{i})^{2}/(2W_{i}{\Sigma}(\mathcal{X}_{i}){W_{i}^{T}})}. \end{array} $$
(91)

From Eq. (91) it is seen that \(\mu (\mathcal {X}_{i})\) and \({\Sigma }(\mathcal {X}_{i})\) are the sufficient statistics of model (86) in the limit \(N\rightarrow \infty \). Furthermore, \(\mu (\mathcal {X}_{i})\) and \({\Sigma }(\mathcal {X}_{i})\) are defined by the set of all distributions \(P(\mathcal {X}_{i,t+1},\mathcal {X}_{jt},\mathcal {X}_{kt})\). Then, the set of all distributions \(P(\mathcal {X}_{i,t+1},\mathcal {X}_{jt},\mathcal {X}_{kt})\) is uniquely identifying for model (86) in the limit \(N\rightarrow \infty \), by Corollary 1. □

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mishchenko, Y. Consistent estimation of complete neuronal connectivity in large neuronal populations using sparse “shotgun” neuronal activity sampling. J Comput Neurosci 41, 157–184 (2016). https://doi.org/10.1007/s10827-016-0611-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10827-016-0611-y

Keywords

Navigation