Skip to main content
Log in

An L 1-regularized logistic model for detecting short-term neuronal interactions

  • Published:
Journal of Computational Neuroscience Aims and scope Submit manuscript

Abstract

Interactions among neurons are a key component of neural signal processing. Rich neural data sets potentially containing evidence of interactions can now be collected readily in the laboratory, but existing analysis methods are often not sufficiently sensitive and specific to reveal these interactions. Generalized linear models offer a platform for analyzing multi-electrode recordings of neuronal spike train data. Here we suggest an L 1-regularized logistic regression model (L 1 L method) to detect short-term (order of 3 ms) neuronal interactions. We estimate the parameters in this model using a coordinate descent algorithm, and determine the optimal tuning parameter using a Bayesian Information Criterion. Simulation studies show that in general the L 1 L method has better sensitivities and specificities than those of the traditional shuffle-corrected cross-correlogram (covariogram) method. The L 1 L method is able to detect excitatory interactions with both high sensitivity and specificity with reasonably large recordings, even when the magnitude of the interactions is small; similar results hold for inhibition given sufficiently high baseline firing rates. Our study also suggests that the false positives can be further removed by thresholding, because their magnitudes are typically smaller than true interactions. Simulations also show that the L 1 L method is somewhat robust to partially observed networks. We apply the method to multi-electrode recordings collected in the monkey dorsal premotor cortex (PMd) while the animal prepares to make reaching arm movements. The results show that some neurons interact differently depending on task conditions. The stronger interactions detected with our L 1 L method were also visible using the covariogram method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Aertsen, A. M. H. J., Gerstein, G. L., Habib, M. K., & Palm, G. (1989). Dynamics of neuronal firing correlation: Modulation of ‘effective connectivity’. Journal of Neurophysiology, 61, 900–917.

    PubMed  CAS  Google Scholar 

  • Avalos, M., Grandvalet, Y., & Ambroise C. (2003). Regularization methods for additive models. In Advances in intelligent data analysis V.

  • Batista, A. P., Santhanam, G., Yu, B. M., Ryu, S. I., Afshar, A., & Shenoy, K. V. (2007). Reference frames for reach planning in macaque dorsal premotor cortex. Journal of Neurophysiology, 98, 966–983.

    Article  PubMed  Google Scholar 

  • Brillinger, D. R. (1988). Maximum likelihood analysis of spike trains of interacting nerve cells. Biological Cybernetics, 59, 189–200.

    Article  PubMed  CAS  Google Scholar 

  • Brody, C. D. (1999). Correlations without synchrony. Neural Computation, 11, 1537–1551.

    Article  PubMed  Google Scholar 

  • Brown, E. N., Kass, R. E., & Mitra, P. P. (2004). Multiple neural spike train data analysis: State-of-the-art and future challenges. Nature Neuroscience, 7(5), 456–461.

    Article  PubMed  CAS  Google Scholar 

  • Chen, Z., Putrino, D. F., Ghosh, S., Barbieri, R., & Brown, E. N. (2010). Statistical inference for assessing functional connectivity of neuronal ensembles with sparse spiking data. In IEEE transactions on neural systems and rehabilitation engineering.

  • Chestek, C. A., Batista, A. P., Santhanam, G., Yu, B. M., Afshar, A., Cunningham, J. P., et al. (2007). Single-neuron stability during repeated reaching in macaque premoter cortex. Journal of Neuroscience, 27(40), 10742–10750.

    Article  PubMed  CAS  Google Scholar 

  • Czanner, G., Grun, S., & Iyengar, S. (2005). Theory of the snowflake plot and its relations to higher-order analysis methods. Neural Computation, 17, 1456–1479.

    Article  PubMed  Google Scholar 

  • Ecker, A. S., Berens, P., Keliris, G. A., Bethge, M., Logothetis, N. K., & Tolias, A. S. (2010). Decorrelated neuronal firing in cortical microcircuits. Science, 327(5965), 584–587.

    Article  PubMed  CAS  Google Scholar 

  • Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Annals of Statistics, 32, 407–499.

    Article  Google Scholar 

  • Eldawlatly, S., Jin, R., & Oweiss, K. G. (2009). Identifying functional connectivity in large-scale neural ensemble recordings: A multiscale data mining approach. Neural Computation, 21, 450–477.

    Article  PubMed  Google Scholar 

  • Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.

    Article  Google Scholar 

  • Friedman, J., Hastie, T., Hofling, H., & Tibshirani, R. (2007). Pathwise coordinate optimization. Annals of Applied Statistics, 1(2), 302–332.

    Article  Google Scholar 

  • Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22.

    PubMed  Google Scholar 

  • Fujisawa, S., Amarasingham, A., Harrison, M. T., & Buzsaki, G. (2008). Behavior-dependent short-term assembly dynamics in the medial prefrontal cortex. Nature Neuroscience, 11(7), 823–833.

    Article  PubMed  CAS  Google Scholar 

  • Gao, Y., Black, M. J., Bienenstock, E., Wei, W., & Donoghue, J. P. (2003). A quantitative comparison of linear and non-linear models of motor cortical activity for the encoding and decoding of arm motions. In First intl. IEEE/EMBS conf. on neural eng. (pp. 189–192).

  • Gerstein, G. L., & Perkel, D. H. (1972). Mutual temporal relationships among neuronal spike trains: Statistical techniques for display and analysis. Biophysical Journal, 12, 453–473.

    Article  PubMed  CAS  Google Scholar 

  • Harrison, M. T., & Geman, S. (2009). A rate and history-preserving resampling algorithm for neural spike trains. Neural Computation, 21, 1244–1258.

    Article  PubMed  Google Scholar 

  • Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning: Data mining, inference and prediction. New York: Springer-Verlag.

    Google Scholar 

  • Kass, R. E., Kelly, R. C., & Loh, W. (2011). Assessment of synchrony in multiple neural spike trains using loglinear point process models. Annals of Applied Statistics, 5(2B), 1262–1292. (Special Section on Statistics and Neuroscience)

    Article  PubMed  Google Scholar 

  • Kelly, R. C., Smith, M. A., Kass, R. E., & Lee, T. S. (2010). Accounting for network effects in neuronal responses using L1 regularized point process models. In Advances in Neural Information Processing Systems (Vol. 23, pp. 1099–1107).

  • Kohn, A., & Smith, M. A. (2005). Stimulus dependence of neuronal correlation in primary visual cortex of the macaque. Journal of Neuroscience, 25(14), 3661–3673.

    Article  PubMed  CAS  Google Scholar 

  • Kulkarni, J. E., & Paninski, L. (2007). Common-input models for multiple neural spike-train data. Network: Computation in Neural Systems, 18(5), 375–407.

    Article  Google Scholar 

  • Matsumura, M., Chen, D., Sawaguchi, T., Kubota, K., & Fetz, E. E. (1996). Synaptic interactions between primate precentral cortex neurons revealed by spike-triggered averaging of intracellular membrane potentials in vivo. Journal of Neuroscience, 16(23), 7757–7767.

    PubMed  CAS  Google Scholar 

  • McCullagh, P., & Nelder, J. A. (1989). Generalized linear models (2nd ed.). London: Chapman and Hall.

    Google Scholar 

  • Meinshausen, N., & Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimentional data. Annals of Statistics, 37(1), 246–270.

    Article  Google Scholar 

  • Mishchencko, Y., Vogelstein, J. T., & Paninski, L. (2011). Bayesian approach for inferring neuronal connectivity from calcium fluorescent imaging data. Annals of Applied Statistics, 5(2B), 1229–1261. (Special Section on Statistics and Neuroscience)

    Article  Google Scholar 

  • Moran, D. W., & Schwartz, A. B. (1999). Motor cortical representation of speed and direction during reaching. Journal of Neurophysiology, 82, 2676–2692.

    PubMed  CAS  Google Scholar 

  • Paninski, L. (2004). Maximum likelihood estimation of cascade point-process neural encoding models. Network: Computation in Neural Systems, 15, 243–262.

    Article  Google Scholar 

  • Park, M. Y., & Hastie, T. (2007). L1-regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society, Series B, 69(4), 659–677.

    Article  Google Scholar 

  • Peng, J., Wang, P., Zhou, N., & Zhu, J. (2009). Partial correlation estimation by joint sparse regression models. Journal of the American Statistical Association, 104(486), 735–746.

    Article  PubMed  CAS  Google Scholar 

  • Perkel, D. H., Gerstein, G. L., & Moore, G. P. (1967). Neuronal spike trains and stochastic point process ii. Simultaneous spike trains. Biophysical Journal, 7, 414–440.

    Google Scholar 

  • Perkel, D. H., Gerstein, G. L., Smith, M. S., & Tatton, W. G. (1975). Nerve-impulse patterns: A quantitative display technique for three neurons. Brain Research, 100, 271–296.

    Article  PubMed  CAS  Google Scholar 

  • Qian, G., & Wu, Y. (2006). Strong limit theorems on the model selection in generalized linear regression with binomial responses. Statistica Sinica, 16, 1335–1365.

    Google Scholar 

  • Reid, C. R., & Alonso, J. (1995). Specificty of monosynaptic connections from thalamus to visual cortex. Nature, 378(16), 281–284.

    Article  PubMed  CAS  Google Scholar 

  • Rosset, S. (2004). Following curved regularized optimization solution paths. Advances in NIPS.

  • Santhanam, G., Sahani, M., Ryu, S., & Shenoy, K. (2004). An extensible infrastructure for fully automated spike sorting during online experiments. In Conf. proc. IEEE eng. med. biol. soc. (Vol. 6, pp. 4380–4384).

  • Stevenson, I. H., Rebesco, J. M., Hatsopoulos, N. G., Haga, Z., Miller, L. E., & Kording, K. P. (2009). Bayesian inference of functional connectivity and network structure from spikes. IEEE TNSRE (Special Issue on Brain Connectivity), 17(3), 203–213.

    Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267–288.

    Google Scholar 

  • Tibshirani, R. (1997). The lasso method for variable selection in the cox model. Statistics in Medicine, 16, 385–395.

    Article  PubMed  CAS  Google Scholar 

  • Truccolo, W., Eden, U. T., Fellows, M. R., Donoghue, J. P., & Brown, E. N. (2005). A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects. Journal of Neurophysiology, 93, 1074–1089.

    Article  PubMed  Google Scholar 

  • Truccolo, W., Hochberg, L. R., & Donoghue, J. P. (2010). Collective dynamics in human and monkey sensorimotor cortex: Predicting single neuron spikes. Nature Neuroscience, 13(1), 105–111.

    Article  PubMed  CAS  Google Scholar 

  • Wang, H., Li, B., & Leng, C. (2009). Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society, Series B, 71(3), 671–683.

    Article  Google Scholar 

  • Wasserman, L., & Roeder, K. (2009). High-dimensional variable selection. Annals of Statistics, 37, 2178–2201.

    Article  PubMed  Google Scholar 

  • Wu, T., & Lange, K. (2008). Pathwise coordinate optimization. Annals of Applied Statistics, 2(1), 224–244.

    Article  Google Scholar 

  • Zhao, M., & Iyengar, S. (2010). Nonconvergence in logistic and poisson models for neural spiking. Neural Computation, 22, 1231–1244.

    Article  PubMed  Google Scholar 

  • Zohary, E., Shadlen, N. M., & Newsome, W. T. (1994). Correlated neuronal discharge rate and its implications for psychophysical performance. Nature, 370, 140–143.

    Article  PubMed  CAS  Google Scholar 

  • Zou, H., Hastie, T., & Tibshirani, R. (2007). On the “degrees of freedom” of the lasso. Annals of Statistics, 35(5), 2173–2192.

    Article  Google Scholar 

Download references

Acknowledgements

We thank Trevor Hastie and Erin Crowder for their advice during the early stages of this work. We thank Ashwin Iyengar for help with the scalable vector figures. The simulations were done using PITTGRID. We also thank the Action Editor and reviewers for their thoughtful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mengyuan Zhao.

Additional information

Action Editor: Rob Kass

Appendix

Appendix

The proof quotes two lemmas and theorems in Qian and Wu (2006), one theorem in Fan and Li (2001) and one theorem in Park and Hastie (2007). To make them hold, we inherit the conditions (C.1)–(C.14) in Qian and Wu (2006) and conditions (A)–(C) in Fan and Li (2001). We refer the reader to those papers for the details. Without elaborating those conditions, we paraphrase the quoted lemmas and theorems as the lemmas for our proof. Intuitively, the conditions (C.1)–(C.6) are requirements for link functions in general, which logit link will not violate (Qian and Wu 2006). The conditions (C.7)–(C.13) are requirements for covariates, where no observation should dominate as the sample size tends to infinity. The conditions (C.14) and (A)–(C) are requirements for log-likelihood functions, where classic likelihood theory can apply.

We denote \(\boldsymbol\beta_0\) as the true values of a collection of P parameters, of which only p are nonzero. Here we assume both p and P finite and not varying with sample size n. Denote the log-likelihood function for logistic regression as l. \(\mathcal{C}\) and \(\mathcal{W}\) are sets of all correct models and all wrong models respectively. \(\hat{\boldsymbol\beta}_c\) stands for the unregularized MLEs under the assumption of model \(c\in\mathcal{C}\), and \(\hat{\boldsymbol\beta}_w\) stands for the unregularized MLEs under the assumption of model \(w\in\mathcal{W}\). \(\hat{\boldsymbol\beta}(\gamma)\) stands for the L 1-regularized estimates at γ. If there is a subscript c or w under \(\hat{\boldsymbol\beta}(\gamma)\), it means that the nonzero estimates in \(\hat{\boldsymbol\beta}(\gamma)\) consist of model c or w.

Lemma 1

(Theorem 2 in Qian and Wu 2006) Under (C.1)(C.14), for any correct model \(c\in\mathcal{C}\)

$$ 0\leq l\left(\hat{\boldsymbol\beta}_c\right)-l(\boldsymbol\beta_0)=O(\log\log n), \mbox{\ a.s..} $$

Lemma 2

(Theorem 3 in Qian and Wu 2006) Under (C.1)–(C.14), for any wrong model \(w\in\mathcal{W}\)

$$ 0< l(\boldsymbol\beta_0)-l\left(\hat{\boldsymbol\beta}_w\right)=O(n), \mbox{\ a.s..} $$

Lemma 3

(Theorem 1 in Fan and Li 2001) Under (A)–(C), there exists a local maximizer \(\hat{\boldsymbol\beta}(\gamma)\) for L 1 -regularized log-likelihood such that \(\parallel\hat{\boldsymbol\beta}(\gamma)-\boldsymbol\beta_0\parallel=O_p(n^{-1/2}+\gamma/n)\) .

Lemma 4

(Lemma 4 in Qian and Wu 2006) Under (C.1)–(C.14), we have each component of \(\frac{\partial l}{\partial\boldsymbol\beta}(\boldsymbol\beta_0)\) equal to \(O(\sqrt{n\log\log n})\) a.s..

Lemma 5

(Lemma 6 in Qian and Wu 2006) Under (C.1)–(C.14), there exists two positive numbers d 1 and d 2 such that the eigenvalues of \(-\partial^2l/\partial\boldsymbol\beta\partial\boldsymbol\beta'\) at \(\boldsymbol\beta_0\) are bounded by d 1 n and d 2 n a.s. as n goes to infinity.

Lemma 6

(Lemma 1 in Park and Hastie 2007) If the intercept in the logistic model are not regularized, when \(\gamma>\max\mid(\frac{\partial l}{\partial\boldsymbol\beta})_j\mid\) ,j = 1, ..., P, the intercept is the only non-zero coefficient.

Proof of the Theorem

Let γ 1 > γ 2. Denote m 1 as the model consisting of d 1 nonzero parameters in \(\hat{\boldsymbol\beta}(\gamma_1)\), and m 2 as the model consisting of d 2 nonzero parameters in \(\hat{\boldsymbol\beta}(\gamma_2)\). Therefore,

$$ \begin{array}{rll} BIC(\gamma_1)-BIC(\gamma_2) & = & -2l(\hat{\boldsymbol\beta}(\gamma_1))+d_1\log n\\ &&-\,\left[-2l\left(\hat{\boldsymbol\beta}(\gamma_2)\right)+d_2\log n\right]\\ & = & (d_1-d_2)\log n+2\left[l\left(\hat{\boldsymbol\beta}(\gamma_2)\right)\right.\\ &&\left.-\, l(\hat{\boldsymbol\beta}(\gamma_1))\right]\\ & = & (d_1-d_2)\log n\\ & & +2\left[l(\hat{\boldsymbol\beta}(\gamma_2))-l(\hat{\boldsymbol\beta}_{m_2}) +l(\hat{\boldsymbol\beta}_{m_2})\right.\\ &&\left.-\, l\left(\hat{\boldsymbol\beta}_{m_1}\right) +l(\hat{\boldsymbol\beta}_{m_1})-l\left(\hat{\boldsymbol\beta}(\gamma_1)\right)\right] \end{array} $$

If \(m_1\in\mathcal{C}\) and \(m_2\in\mathcal{C}\), by Lemma 1, we have (d 1 − d 2)log n = O(log n) < 0 and \(l(\hat{\boldsymbol\beta}_{m_2})-l(\hat{\boldsymbol\beta}_{m_1})=O(\log\log n)>0\). By the definition of maximum likelihood, we also have \(l(\hat{\boldsymbol\beta}(\gamma_2))-l(\hat{\boldsymbol\beta}_{m_2})<0\). Therefore, as long as \(l(\hat{\boldsymbol\beta}_{m_1})-l(\hat{\boldsymbol\beta}(\gamma_1))=o(\log n)\), BIC(γ 1) − BIC(γ 2) < 0 and the correct model m 1 with smaller number of parameters is selected.

If \(m_1\in\mathcal{W}\) and \(m_2\in\mathcal{C}\), by Lemma 2, we have (d 1 − d 2)log n = O(log n) < 0 and \(l(\hat{\boldsymbol\beta}_{m_2})-l(\hat{\boldsymbol\beta}_{m_1})=O(n)>0\). Again by the definition of maximum likelihood, we have \(l(\hat{\boldsymbol\beta}_{m_1})-l(\hat{\boldsymbol\beta}(\gamma_1))>0\). Therefore, as long as \(l(\hat{\boldsymbol\beta}(\gamma_2))-l(\hat{\boldsymbol\beta}_{m_2})=o(n)\), BIC(γ 1) − BIC(γ 2) > 0 and the correct model m 2 is selected.

Thus, it is required to show that, for any \(c\in\mathcal{C}\), we have \(l(\hat{\boldsymbol\beta}_{c})-l(\hat{\boldsymbol\beta}_c(\gamma))=o(\log n)\). Because \(l(\hat{\boldsymbol\beta}_{c})-l(\boldsymbol\beta_{0})=O(\log\log n)\), it suffices to show \(l(\boldsymbol\beta_{0})-l(\hat{\boldsymbol\beta}_c(\gamma))=o(\log n)\). By a Taylor expansion, we have

$$ \begin{array}{rll} l(\boldsymbol\beta)-l(\boldsymbol\beta_{0}) &=& (\boldsymbol\beta-\boldsymbol\beta_{0})'\frac{\partial l(\boldsymbol\beta_0)}{\partial\boldsymbol\beta}\\ &&+\,\frac{1}{2}(\boldsymbol\beta-\boldsymbol\beta_{0})' \frac{\partial^2l(\boldsymbol\beta_0)}{\partial\boldsymbol\beta\partial\boldsymbol\beta'}(\boldsymbol\beta-\boldsymbol\beta_{0})\\ &&+\, o\left(\left\|\hat{\boldsymbol\beta}(\gamma)-\boldsymbol\beta_0\right\|^2\right). \end{array} $$

So by Lemmas 3, 4 and 5, we have

$$ \begin{array}{rll} l(\boldsymbol\beta_{0})-l\left(\hat{\boldsymbol\beta}_c(\gamma)\right) &=& O\left(1/\sqrt{n}+\gamma/n\right)O\left(\sqrt{n\log\log n}\right)\\ &&+\,O(n)O\left(\left(1/\sqrt{n}+\gamma/n\right)^2\right). \end{array} $$

When \(\gamma=o(\sqrt{n\log n})\), we have \(l(\boldsymbol\beta_{0})-l(\hat{\boldsymbol\beta}_c(\gamma))=o(\log n)\).

In the end, because Lemma 6 says that, when \(\gamma>\max\mid(\frac{\partial l}{\partial\boldsymbol\beta})_j\mid=O(\sqrt{n\log\log n})\), it gives a null model with only an intercept, we do not need a tuning parameter γ exceeding \(o(\sqrt{n\log n})\). Therefore, \(l(\boldsymbol\beta_{0})-l(\hat{\boldsymbol\beta}_c(\gamma))=o(\log n)\) is achievable for all correct models given by \(\hat{\boldsymbol\beta}(\gamma)\). Therefore, the BIC γ-selector selects the correct model with smallest number of parameters among all the submodels \(\hat{\boldsymbol\beta}(\gamma)\) presents.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, M., Batista, A., Cunningham, J.P. et al. An L 1-regularized logistic model for detecting short-term neuronal interactions. J Comput Neurosci 32, 479–497 (2012). https://doi.org/10.1007/s10827-011-0365-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10827-011-0365-5

Keywords

Navigation