An L 1-regularized logistic model for detecting short-term neuronal interactions

Abstract

Interactions among neurons are a key component of neural signal processing. Rich neural data sets potentially containing evidence of interactions can now be collected readily in the laboratory, but existing analysis methods are often not sufficiently sensitive and specific to reveal these interactions. Generalized linear models offer a platform for analyzing multi-electrode recordings of neuronal spike train data. Here we suggest an L 1-regularized logistic regression model (L 1 L method) to detect short-term (order of 3 ms) neuronal interactions. We estimate the parameters in this model using a coordinate descent algorithm, and determine the optimal tuning parameter using a Bayesian Information Criterion. Simulation studies show that in general the L 1 L method has better sensitivities and specificities than those of the traditional shuffle-corrected cross-correlogram (covariogram) method. The L 1 L method is able to detect excitatory interactions with both high sensitivity and specificity with reasonably large recordings, even when the magnitude of the interactions is small; similar results hold for inhibition given sufficiently high baseline firing rates. Our study also suggests that the false positives can be further removed by thresholding, because their magnitudes are typically smaller than true interactions. Simulations also show that the L 1 L method is somewhat robust to partially observed networks. We apply the method to multi-electrode recordings collected in the monkey dorsal premotor cortex (PMd) while the animal prepares to make reaching arm movements. The results show that some neurons interact differently depending on task conditions. The stronger interactions detected with our L 1 L method were also visible using the covariogram method.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

References

  1. Aertsen, A. M. H. J., Gerstein, G. L., Habib, M. K., & Palm, G. (1989). Dynamics of neuronal firing correlation: Modulation of ‘effective connectivity’. Journal of Neurophysiology, 61, 900–917.

    PubMed  CAS  Google Scholar 

  2. Avalos, M., Grandvalet, Y., & Ambroise C. (2003). Regularization methods for additive models. In Advances in intelligent data analysis V.

  3. Batista, A. P., Santhanam, G., Yu, B. M., Ryu, S. I., Afshar, A., & Shenoy, K. V. (2007). Reference frames for reach planning in macaque dorsal premotor cortex. Journal of Neurophysiology, 98, 966–983.

    PubMed  Article  Google Scholar 

  4. Brillinger, D. R. (1988). Maximum likelihood analysis of spike trains of interacting nerve cells. Biological Cybernetics, 59, 189–200.

    PubMed  Article  CAS  Google Scholar 

  5. Brody, C. D. (1999). Correlations without synchrony. Neural Computation, 11, 1537–1551.

    PubMed  Article  Google Scholar 

  6. Brown, E. N., Kass, R. E., & Mitra, P. P. (2004). Multiple neural spike train data analysis: State-of-the-art and future challenges. Nature Neuroscience, 7(5), 456–461.

    PubMed  Article  CAS  Google Scholar 

  7. Chen, Z., Putrino, D. F., Ghosh, S., Barbieri, R., & Brown, E. N. (2010). Statistical inference for assessing functional connectivity of neuronal ensembles with sparse spiking data. In IEEE transactions on neural systems and rehabilitation engineering.

  8. Chestek, C. A., Batista, A. P., Santhanam, G., Yu, B. M., Afshar, A., Cunningham, J. P., et al. (2007). Single-neuron stability during repeated reaching in macaque premoter cortex. Journal of Neuroscience, 27(40), 10742–10750.

    PubMed  Article  CAS  Google Scholar 

  9. Czanner, G., Grun, S., & Iyengar, S. (2005). Theory of the snowflake plot and its relations to higher-order analysis methods. Neural Computation, 17, 1456–1479.

    PubMed  Article  Google Scholar 

  10. Ecker, A. S., Berens, P., Keliris, G. A., Bethge, M., Logothetis, N. K., & Tolias, A. S. (2010). Decorrelated neuronal firing in cortical microcircuits. Science, 327(5965), 584–587.

    PubMed  Article  CAS  Google Scholar 

  11. Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Annals of Statistics, 32, 407–499.

    Article  Google Scholar 

  12. Eldawlatly, S., Jin, R., & Oweiss, K. G. (2009). Identifying functional connectivity in large-scale neural ensemble recordings: A multiscale data mining approach. Neural Computation, 21, 450–477.

    PubMed  Article  Google Scholar 

  13. Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96(456), 1348–1360.

    Article  Google Scholar 

  14. Friedman, J., Hastie, T., Hofling, H., & Tibshirani, R. (2007). Pathwise coordinate optimization. Annals of Applied Statistics, 1(2), 302–332.

    Article  Google Scholar 

  15. Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1), 1–22.

    PubMed  Google Scholar 

  16. Fujisawa, S., Amarasingham, A., Harrison, M. T., & Buzsaki, G. (2008). Behavior-dependent short-term assembly dynamics in the medial prefrontal cortex. Nature Neuroscience, 11(7), 823–833.

    PubMed  Article  CAS  Google Scholar 

  17. Gao, Y., Black, M. J., Bienenstock, E., Wei, W., & Donoghue, J. P. (2003). A quantitative comparison of linear and non-linear models of motor cortical activity for the encoding and decoding of arm motions. In First intl. IEEE/EMBS conf. on neural eng. (pp. 189–192).

  18. Gerstein, G. L., & Perkel, D. H. (1972). Mutual temporal relationships among neuronal spike trains: Statistical techniques for display and analysis. Biophysical Journal, 12, 453–473.

    PubMed  Article  CAS  Google Scholar 

  19. Harrison, M. T., & Geman, S. (2009). A rate and history-preserving resampling algorithm for neural spike trains. Neural Computation, 21, 1244–1258.

    PubMed  Article  Google Scholar 

  20. Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning: Data mining, inference and prediction. New York: Springer-Verlag.

    Google Scholar 

  21. Kass, R. E., Kelly, R. C., & Loh, W. (2011). Assessment of synchrony in multiple neural spike trains using loglinear point process models. Annals of Applied Statistics, 5(2B), 1262–1292. (Special Section on Statistics and Neuroscience)

    PubMed  Article  Google Scholar 

  22. Kelly, R. C., Smith, M. A., Kass, R. E., & Lee, T. S. (2010). Accounting for network effects in neuronal responses using L1 regularized point process models. In Advances in Neural Information Processing Systems (Vol. 23, pp. 1099–1107).

  23. Kohn, A., & Smith, M. A. (2005). Stimulus dependence of neuronal correlation in primary visual cortex of the macaque. Journal of Neuroscience, 25(14), 3661–3673.

    PubMed  Article  CAS  Google Scholar 

  24. Kulkarni, J. E., & Paninski, L. (2007). Common-input models for multiple neural spike-train data. Network: Computation in Neural Systems, 18(5), 375–407.

    Article  Google Scholar 

  25. Matsumura, M., Chen, D., Sawaguchi, T., Kubota, K., & Fetz, E. E. (1996). Synaptic interactions between primate precentral cortex neurons revealed by spike-triggered averaging of intracellular membrane potentials in vivo. Journal of Neuroscience, 16(23), 7757–7767.

    PubMed  CAS  Google Scholar 

  26. McCullagh, P., & Nelder, J. A. (1989). Generalized linear models (2nd ed.). London: Chapman and Hall.

    Google Scholar 

  27. Meinshausen, N., & Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimentional data. Annals of Statistics, 37(1), 246–270.

    Article  Google Scholar 

  28. Mishchencko, Y., Vogelstein, J. T., & Paninski, L. (2011). Bayesian approach for inferring neuronal connectivity from calcium fluorescent imaging data. Annals of Applied Statistics, 5(2B), 1229–1261. (Special Section on Statistics and Neuroscience)

    Article  Google Scholar 

  29. Moran, D. W., & Schwartz, A. B. (1999). Motor cortical representation of speed and direction during reaching. Journal of Neurophysiology, 82, 2676–2692.

    PubMed  CAS  Google Scholar 

  30. Paninski, L. (2004). Maximum likelihood estimation of cascade point-process neural encoding models. Network: Computation in Neural Systems, 15, 243–262.

    Article  Google Scholar 

  31. Park, M. Y., & Hastie, T. (2007). L1-regularization path algorithm for generalized linear models. Journal of the Royal Statistical Society, Series B, 69(4), 659–677.

    Article  Google Scholar 

  32. Peng, J., Wang, P., Zhou, N., & Zhu, J. (2009). Partial correlation estimation by joint sparse regression models. Journal of the American Statistical Association, 104(486), 735–746.

    PubMed  Article  CAS  Google Scholar 

  33. Perkel, D. H., Gerstein, G. L., & Moore, G. P. (1967). Neuronal spike trains and stochastic point process ii. Simultaneous spike trains. Biophysical Journal, 7, 414–440.

    Google Scholar 

  34. Perkel, D. H., Gerstein, G. L., Smith, M. S., & Tatton, W. G. (1975). Nerve-impulse patterns: A quantitative display technique for three neurons. Brain Research, 100, 271–296.

    PubMed  Article  CAS  Google Scholar 

  35. Qian, G., & Wu, Y. (2006). Strong limit theorems on the model selection in generalized linear regression with binomial responses. Statistica Sinica, 16, 1335–1365.

    Google Scholar 

  36. Reid, C. R., & Alonso, J. (1995). Specificty of monosynaptic connections from thalamus to visual cortex. Nature, 378(16), 281–284.

    PubMed  Article  CAS  Google Scholar 

  37. Rosset, S. (2004). Following curved regularized optimization solution paths. Advances in NIPS.

  38. Santhanam, G., Sahani, M., Ryu, S., & Shenoy, K. (2004). An extensible infrastructure for fully automated spike sorting during online experiments. In Conf. proc. IEEE eng. med. biol. soc. (Vol. 6, pp. 4380–4384).

  39. Stevenson, I. H., Rebesco, J. M., Hatsopoulos, N. G., Haga, Z., Miller, L. E., & Kording, K. P. (2009). Bayesian inference of functional connectivity and network structure from spikes. IEEE TNSRE (Special Issue on Brain Connectivity), 17(3), 203–213.

    Google Scholar 

  40. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58, 267–288.

    Google Scholar 

  41. Tibshirani, R. (1997). The lasso method for variable selection in the cox model. Statistics in Medicine, 16, 385–395.

    PubMed  Article  CAS  Google Scholar 

  42. Truccolo, W., Eden, U. T., Fellows, M. R., Donoghue, J. P., & Brown, E. N. (2005). A point process framework for relating neural spiking activity to spiking history, neural ensemble, and extrinsic covariate effects. Journal of Neurophysiology, 93, 1074–1089.

    PubMed  Article  Google Scholar 

  43. Truccolo, W., Hochberg, L. R., & Donoghue, J. P. (2010). Collective dynamics in human and monkey sensorimotor cortex: Predicting single neuron spikes. Nature Neuroscience, 13(1), 105–111.

    PubMed  Article  CAS  Google Scholar 

  44. Wang, H., Li, B., & Leng, C. (2009). Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society, Series B, 71(3), 671–683.

    Article  Google Scholar 

  45. Wasserman, L., & Roeder, K. (2009). High-dimensional variable selection. Annals of Statistics, 37, 2178–2201.

    PubMed  Article  Google Scholar 

  46. Wu, T., & Lange, K. (2008). Pathwise coordinate optimization. Annals of Applied Statistics, 2(1), 224–244.

    Article  Google Scholar 

  47. Zhao, M., & Iyengar, S. (2010). Nonconvergence in logistic and poisson models for neural spiking. Neural Computation, 22, 1231–1244.

    PubMed  Article  Google Scholar 

  48. Zohary, E., Shadlen, N. M., & Newsome, W. T. (1994). Correlated neuronal discharge rate and its implications for psychophysical performance. Nature, 370, 140–143.

    PubMed  Article  CAS  Google Scholar 

  49. Zou, H., Hastie, T., & Tibshirani, R. (2007). On the “degrees of freedom” of the lasso. Annals of Statistics, 35(5), 2173–2192.

    Article  Google Scholar 

Download references

Acknowledgements

We thank Trevor Hastie and Erin Crowder for their advice during the early stages of this work. We thank Ashwin Iyengar for help with the scalable vector figures. The simulations were done using PITTGRID. We also thank the Action Editor and reviewers for their thoughtful comments.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Mengyuan Zhao.

Additional information

Action Editor: Rob Kass

Appendix

Appendix

The proof quotes two lemmas and theorems in Qian and Wu (2006), one theorem in Fan and Li (2001) and one theorem in Park and Hastie (2007). To make them hold, we inherit the conditions (C.1)–(C.14) in Qian and Wu (2006) and conditions (A)–(C) in Fan and Li (2001). We refer the reader to those papers for the details. Without elaborating those conditions, we paraphrase the quoted lemmas and theorems as the lemmas for our proof. Intuitively, the conditions (C.1)–(C.6) are requirements for link functions in general, which logit link will not violate (Qian and Wu 2006). The conditions (C.7)–(C.13) are requirements for covariates, where no observation should dominate as the sample size tends to infinity. The conditions (C.14) and (A)–(C) are requirements for log-likelihood functions, where classic likelihood theory can apply.

We denote \(\boldsymbol\beta_0\) as the true values of a collection of P parameters, of which only p are nonzero. Here we assume both p and P finite and not varying with sample size n. Denote the log-likelihood function for logistic regression as l. \(\mathcal{C}\) and \(\mathcal{W}\) are sets of all correct models and all wrong models respectively. \(\hat{\boldsymbol\beta}_c\) stands for the unregularized MLEs under the assumption of model \(c\in\mathcal{C}\), and \(\hat{\boldsymbol\beta}_w\) stands for the unregularized MLEs under the assumption of model \(w\in\mathcal{W}\). \(\hat{\boldsymbol\beta}(\gamma)\) stands for the L 1-regularized estimates at γ. If there is a subscript c or w under \(\hat{\boldsymbol\beta}(\gamma)\), it means that the nonzero estimates in \(\hat{\boldsymbol\beta}(\gamma)\) consist of model c or w.

Lemma 1

(Theorem 2 in Qian and Wu 2006) Under (C.1)(C.14), for any correct model \(c\in\mathcal{C}\)

$$ 0\leq l\left(\hat{\boldsymbol\beta}_c\right)-l(\boldsymbol\beta_0)=O(\log\log n), \mbox{\ a.s..} $$

Lemma 2

(Theorem 3 in Qian and Wu 2006) Under (C.1)–(C.14), for any wrong model \(w\in\mathcal{W}\)

$$ 0< l(\boldsymbol\beta_0)-l\left(\hat{\boldsymbol\beta}_w\right)=O(n), \mbox{\ a.s..} $$

Lemma 3

(Theorem 1 in Fan and Li 2001) Under (A)–(C), there exists a local maximizer \(\hat{\boldsymbol\beta}(\gamma)\) for L 1 -regularized log-likelihood such that \(\parallel\hat{\boldsymbol\beta}(\gamma)-\boldsymbol\beta_0\parallel=O_p(n^{-1/2}+\gamma/n)\) .

Lemma 4

(Lemma 4 in Qian and Wu 2006) Under (C.1)–(C.14), we have each component of \(\frac{\partial l}{\partial\boldsymbol\beta}(\boldsymbol\beta_0)\) equal to \(O(\sqrt{n\log\log n})\) a.s..

Lemma 5

(Lemma 6 in Qian and Wu 2006) Under (C.1)–(C.14), there exists two positive numbers d 1 and d 2 such that the eigenvalues of \(-\partial^2l/\partial\boldsymbol\beta\partial\boldsymbol\beta'\) at \(\boldsymbol\beta_0\) are bounded by d 1 n and d 2 n a.s. as n goes to infinity.

Lemma 6

(Lemma 1 in Park and Hastie 2007) If the intercept in the logistic model are not regularized, when \(\gamma>\max\mid(\frac{\partial l}{\partial\boldsymbol\beta})_j\mid\) ,j = 1, ..., P, the intercept is the only non-zero coefficient.

Proof of the Theorem

Let γ 1 > γ 2. Denote m 1 as the model consisting of d 1 nonzero parameters in \(\hat{\boldsymbol\beta}(\gamma_1)\), and m 2 as the model consisting of d 2 nonzero parameters in \(\hat{\boldsymbol\beta}(\gamma_2)\). Therefore,

$$ \begin{array}{rll} BIC(\gamma_1)-BIC(\gamma_2) & = & -2l(\hat{\boldsymbol\beta}(\gamma_1))+d_1\log n\\ &&-\,\left[-2l\left(\hat{\boldsymbol\beta}(\gamma_2)\right)+d_2\log n\right]\\ & = & (d_1-d_2)\log n+2\left[l\left(\hat{\boldsymbol\beta}(\gamma_2)\right)\right.\\ &&\left.-\, l(\hat{\boldsymbol\beta}(\gamma_1))\right]\\ & = & (d_1-d_2)\log n\\ & & +2\left[l(\hat{\boldsymbol\beta}(\gamma_2))-l(\hat{\boldsymbol\beta}_{m_2}) +l(\hat{\boldsymbol\beta}_{m_2})\right.\\ &&\left.-\, l\left(\hat{\boldsymbol\beta}_{m_1}\right) +l(\hat{\boldsymbol\beta}_{m_1})-l\left(\hat{\boldsymbol\beta}(\gamma_1)\right)\right] \end{array} $$

If \(m_1\in\mathcal{C}\) and \(m_2\in\mathcal{C}\), by Lemma 1, we have (d 1 − d 2)log n = O(log n) < 0 and \(l(\hat{\boldsymbol\beta}_{m_2})-l(\hat{\boldsymbol\beta}_{m_1})=O(\log\log n)>0\). By the definition of maximum likelihood, we also have \(l(\hat{\boldsymbol\beta}(\gamma_2))-l(\hat{\boldsymbol\beta}_{m_2})<0\). Therefore, as long as \(l(\hat{\boldsymbol\beta}_{m_1})-l(\hat{\boldsymbol\beta}(\gamma_1))=o(\log n)\), BIC(γ 1) − BIC(γ 2) < 0 and the correct model m 1 with smaller number of parameters is selected.

If \(m_1\in\mathcal{W}\) and \(m_2\in\mathcal{C}\), by Lemma 2, we have (d 1 − d 2)log n = O(log n) < 0 and \(l(\hat{\boldsymbol\beta}_{m_2})-l(\hat{\boldsymbol\beta}_{m_1})=O(n)>0\). Again by the definition of maximum likelihood, we have \(l(\hat{\boldsymbol\beta}_{m_1})-l(\hat{\boldsymbol\beta}(\gamma_1))>0\). Therefore, as long as \(l(\hat{\boldsymbol\beta}(\gamma_2))-l(\hat{\boldsymbol\beta}_{m_2})=o(n)\), BIC(γ 1) − BIC(γ 2) > 0 and the correct model m 2 is selected.

Thus, it is required to show that, for any \(c\in\mathcal{C}\), we have \(l(\hat{\boldsymbol\beta}_{c})-l(\hat{\boldsymbol\beta}_c(\gamma))=o(\log n)\). Because \(l(\hat{\boldsymbol\beta}_{c})-l(\boldsymbol\beta_{0})=O(\log\log n)\), it suffices to show \(l(\boldsymbol\beta_{0})-l(\hat{\boldsymbol\beta}_c(\gamma))=o(\log n)\). By a Taylor expansion, we have

$$ \begin{array}{rll} l(\boldsymbol\beta)-l(\boldsymbol\beta_{0}) &=& (\boldsymbol\beta-\boldsymbol\beta_{0})'\frac{\partial l(\boldsymbol\beta_0)}{\partial\boldsymbol\beta}\\ &&+\,\frac{1}{2}(\boldsymbol\beta-\boldsymbol\beta_{0})' \frac{\partial^2l(\boldsymbol\beta_0)}{\partial\boldsymbol\beta\partial\boldsymbol\beta'}(\boldsymbol\beta-\boldsymbol\beta_{0})\\ &&+\, o\left(\left\|\hat{\boldsymbol\beta}(\gamma)-\boldsymbol\beta_0\right\|^2\right). \end{array} $$

So by Lemmas 3, 4 and 5, we have

$$ \begin{array}{rll} l(\boldsymbol\beta_{0})-l\left(\hat{\boldsymbol\beta}_c(\gamma)\right) &=& O\left(1/\sqrt{n}+\gamma/n\right)O\left(\sqrt{n\log\log n}\right)\\ &&+\,O(n)O\left(\left(1/\sqrt{n}+\gamma/n\right)^2\right). \end{array} $$

When \(\gamma=o(\sqrt{n\log n})\), we have \(l(\boldsymbol\beta_{0})-l(\hat{\boldsymbol\beta}_c(\gamma))=o(\log n)\).

In the end, because Lemma 6 says that, when \(\gamma>\max\mid(\frac{\partial l}{\partial\boldsymbol\beta})_j\mid=O(\sqrt{n\log\log n})\), it gives a null model with only an intercept, we do not need a tuning parameter γ exceeding \(o(\sqrt{n\log n})\). Therefore, \(l(\boldsymbol\beta_{0})-l(\hat{\boldsymbol\beta}_c(\gamma))=o(\log n)\) is achievable for all correct models given by \(\hat{\boldsymbol\beta}(\gamma)\). Therefore, the BIC γ-selector selects the correct model with smallest number of parameters among all the submodels \(\hat{\boldsymbol\beta}(\gamma)\) presents.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Zhao, M., Batista, A., Cunningham, J.P. et al. An L 1-regularized logistic model for detecting short-term neuronal interactions. J Comput Neurosci 32, 479–497 (2012). https://doi.org/10.1007/s10827-011-0365-5

Download citation

Keywords

  • Multi-electrode recording
  • Model selection
  • Coordinate descent
  • BIC
  • Premotor cortex