Functional identification of biological neural networks using reservoir adaptation for point processes
 First Online:
 Received:
 Revised:
 Accepted:
Abstract
The complexity of biological neural networks does not allow to directly relate their biophysical properties to the dynamics of their electrical activity. We present a reservoir computing approach for functionally identifying a biological neural network, i.e. for building an artificial system that is functionally equivalent to the reference biological network. Employing feedforward and recurrent networks with fading memory, i.e. reservoirs, we propose a point process based learning algorithm to train the internal parameters of the reservoir and the connectivity between the reservoir and the memoryless readout neurons. Specifically, the model is an Echo State Network (ESN) with leaky integrator neurons, whose individual leakage time constants are also adapted. The proposed ESN algorithm learns a predictive model of stimulusresponse relations in in vitro and simulated networks, i.e. it models their response dynamics. Receiver Operating Characteristic (ROC) curve analysis indicates that these ESNs can imitate the response signal of a reference biological network. Reservoir adaptation improved the performance of an ESN over readoutonly training methods in many cases. This also held for adaptive feedforward reservoirs, which had no recurrent dynamics. We demonstrate the predictive power of these ESNs on various tasks with cultured and simulated biological neural networks.
Keywords
Cultured neural networks Echo State Networks Reservoir computing1 Introduction
One central goal in neuroscience is to understand how the brain represents, processes, and conveys information. Starting from Hebb’s cell assemblies (Brown and Milner 2003; Hebb 1949), many neurobiologically founded theories and hypotheses have been developed towards this goal. It stands clear now that spikes are the elemental quanta of information processing in the mammalian cortex (Hodgkin and Huxley 1952; Kandel et al. 2000). As a result of extensive experiments of cortical recordings, it has been widely postulated and accepted that function and information of the cortex are encoded in the spatiotemporal network dynamics (Abeles et al. 1993; Lindsey et al. 1997; Mainen and Sejnowski 1995; Prut et al. 1998; Riehle et al. 1997; Villa et al. 1999). The right level of describing the dynamics, however, is a matter of intensive discussions (Prut et al. 1998; Rieke et al. 1999). Are spike rates or spike timings more relevant? What is the right temporal precision if the latter proves significant? What should be the spatial resolution of this description? How far can the population activity of neurons be related to function or behavior? Does the correlated activity of multiple neurons indicate a functionally relevant state?
Depending on the answers to the above questions one would preferably apply different models to relate the network activity to function. Another approach is to employ a generic network model, which can be assumed to be universal for problems of neural encoding. The parameters of the model would be learned by adaptive algorithms. Obviously, such a model should be able to deal with single spikes with high temporal precision as well as population rates. It should also be able to catch, with the appropriate parameters, network synchrony and polychrony (Izhikevich 2006).
1.1 Reservoir computing
Liquid State Machines (LSM) and Echo State Networks (ESN) have been introduced as efficiently learning recurrent neural networks (Jaeger 2001; Maass et al. 2002). The common key contribution of these approaches is the proposal of a recurrent neural network with a fixed connectivity, i.e. a reservoir, which does not have stable states and has a fading memory of the previous inputs and network states. In response to an input stream, the reservoir generates a higherdimensional spatiotemporal dynamics reflecting the structure in the input stream. The higher dimensional reservoir state can be mapped to a target output stream online, with a second module, namely a readout. LSMs are networks of spiking integrateandfire (IAF) neurons whereas ESNs use continuous valued sigmoid neurons and a single layer of readout neurons (see Section 2).
With the appropriate parameters, reservoir dynamics can be sensitive to different features of the input such as correlations, polychronous and synchronous spikes, different frequency bands or even temporally precise single spikes. For instance, LSMs can approximate any time invariant filter with fading memory if their traces of internal states differ at least for one time point in response to any two different input streams (Separation Property, SP) and if the readout modules can approximate any continuous function from ℝ^{m} → ℝ, where m ∈ ℕ (Approximation Property, AP) (Maass et al. 2002). Satisfying the separation property depends on whether the reservoir is composed of sufficient basis filters. A random reservoir needs to have a rich repertoire of basis filters in order to approximate the target timeinvariant filter. This could be achieved for several tasks with sufficiently large random reservoirs (Maass et al. 2002). Furthermore, reservoirs have been shown to simulate a large class of higher order differential equations with appropriate feedback and readout functions (Maass et al. 2007). These findings suggest that reservoir computing can be used as a generic tool for the problems of neural encoding.
Biological neural networks, in vivo, process a continuous stream of inputs in real time. Moreover, they prove successful to react and compute fast, independent of their instantaneous state and ongoing activity, when prompted by sudden changes in stimuli. In other words, they perform any time computing. Reservoir computing has also been suggested as a model of cortical information processing for their capability of online and anytime computing, for their fading memory and for their separation properties. It has been argued that specific cortical circuitry is possible to build into generic LSM framework (Maass et al. 2004). Bringing reservoir computing into the problem will not only deliver expressive models that can distinguish a rich set of input patterns, but also may provide more biological relevance to the theoretical tool.
1.2 Neuronal cell cultures as biological neural networks
While brain tissue has highly specialized architecture and developmental history, generic biological networks can be created as cell cultures of mammalian cortical neurons that have been dissociated and regrown outside an organism. They are closed system in vitro living networks, which are frequently used to study physiological properties of neural systems (Marom and Shahaf 2002). Although the anatomical structure of the brain is not preserved in cultures, their inherent properties as networks of real neurons, their accessibility and small size make them very attractive for investigating information processing in biological networks. Using cultured networks also eliminates the problem of interference with ongoing activity from different parts of the brain. Compared to in vitro brain slices, cultured networks are more accessible in terms of ratio of the recorded neurons to the whole network. In other words, they pose a less serious undersampling problem. Studying such networks can provide insight into the generic properties biological neural networks, independent of a specific anatomy (Marom and Shahaf 2002). Another motivation to study cultures is understanding and employing them as biological computers. For instance, neuronal cell cultures have been shown to be able to learn. Shahaf and Marom (2001) demonstrated how the response of the network to a stimulus can be ‘improved’ by systematically training the culture. Jimbo et al. (1999) showed how neuronal cultures can increase or decrease their overall response to pulse inputs stimulus by tetanic stimuli training.
1.3 Problem statement
Although we cannot relate the activity dynamics to a physiological function in random in vitro BNNs, by studying their activity dynamics, we gain experience and information about generic network properties forming the basis of in vivo networks. Moreover, one can assign pseudofunctions to random BNNs by artificially mapping network states to a predefined set of actions (Chao et al. 2008). One can also regard the response spike train of a BNN to a stream of various stimuli as a very detailed characterization of its pseudofunction and aim at modeling stimulusresponse relations. In the present work, we take this approach. We record the output responses of simulated and cultured BNNs to random multivariate streams of stimuli. We tackle the question whether it is possible to train an artificial neural network that predicts the response of a reference biological neural network under the applied stimulus range. In other words, we aim at generating an equivalent network of a BNN in terms of stimulusresponse relations. Given the same stimulus, the equivalent network should predict the output of the biological neural network.
A model or a predictor for biological neural networks can be useful for relating the physiological and physical determinants of its activity and thereby, can be a tool for analyzing information coding in these networks. It can also be helpful for interacting with BNNs by means of electrical stimulation. Here, we employ Echo State Networks (ESN) as a reservoir computing tool and propose an algorithm to find appropriate models for the relations between continuous input streams and multiunit recordings in biological neural networks. The algorithm uses point process loglikelihood as an optimization criterion and adapts the readout and reservoir parameters of ESNs accordingly. Moreover, we investigate the performance of our approach on different feedforward and recurrent reservoir structures and demonstrate its applicability to stimulusresponse modeling problem in BNNs.
We shortly review ESNs in Section 2 and point process modeling of spike data in Section 3. An elaboration of ESN adaptation for point processes is presented in Section 4. In Section 5 we present our evaluation methods. A detailed experimental section and their implications can be found in Sections 6 and 7, respectively.
2 Echo State Networks
2.1 Dynamics
 W
N × N internal reservoir weight matrix,
 W^{in}
N × K input weight matrix,
 W^{out}
L × (N + K) output weight matrix,
 x^{n}
N × 1 state vector for time step n,
 u^{n}
K × 1 input vector for time step n,
 y^{n}
L × 1 output vector for time step n,
 ’;’
vertical vector concatenation.
Note that ESNs can have feedback projections from the readout module to the reservoir (Jaeger 2001). Throughout this note, however, we employ ESN architectures without such feedback projections. We choose \(f(x) = \tanh(x) = \frac{\exp(x)  \exp(x)}{\exp(x) + \exp(x)}\), which allows for existence of echo states.
2.2 ESN with leaky integrators
2.3 FeedForward Echo State Networks
2.4 ESN learning
Although ESN learning is restricted to readout learning in many cases, there have recently been several approaches to adapt reservoirs, among which a significant improvement over untrained reservoirs was reported by Steil (2007). Adopting the intrinsic plasticity rule from real biological systems improved the performance of the backpropagationdecorrelation algorithm.
In the current approach, we adapt the reservoir connectivity and individual neuronal time constants with a onestep propagation of the loglikelihood into the reservoir. This is basically a gradient descent approach, where the loglikelihood of a point process is used as an optimization criterion. We elaborate our approach in Section 4.
3 Point process modeling of spike data
Signals that are comprised of pointlike events in time, e.g. trains of action potentials generated by neurons, can be characterized in terms of stochastic point processes (Brown et al. 2001; Chornoboy et al. 1988; Okatan et al. 2005; Rajaram et al. 2005). A spiking process, modeled as a point process, is in turn fully characterized by its conditional intensity function (Cox and Isham 1980; Daley and VereJones 2003).
The quality of point process modeling strongly depends on the expressive power of function λ(tI_{t},H_{t},θ). Linear or loglinear models might be preferable for successful adaptation by gradient descent, whereas nonlinear functions allow for more expressive models. In the current work, we model this function by an Echo State Network, which inherently incorporates input and network history into the instantaneous network states.
4 Point process modeling with ESN and log likelihood propagation
4.1 Reservoir adaptation
One interesting question is whether it is sufficient to learn the output weights or whether one needs to adapt the reservoir using the point process loglikelihood as the fitness criterion. In the current work, we adapt recurrent and feedforward reservoirs with one step propagation of the point process loglikelihood into the reservoir and compare the results to nonadaptive reservoirs.
4.1.1 Adapting neuronal time constants
The complexity of the readoutonly learning is O(LN) for each time step, where L is the number of readout units. Running the trained ESN on the test input stream has also a complexity of O(LN) for sparse reservoirs, where the number of connections linearly scales with the number of reservoir units. All the reservoirs we use belong to this sparse type. In this case, the complexity of the onestep loglikelihood propagation is also O(LN).
4.2 Existence of local maxima and confidence intervals
Due to deep architecture and reservoir transfer function, we cannot guarantee that gradient descent in the reservoir parameters yields uniquely true, i.e. globally optimal, parameters. Point process loglikelihood is not concave with respect to the whole set of parameters. Reservoir adaptation, as a result, is not a tool for finding a globally optimal equivalent of a given BNN. The quality of reservoir adaptation is evaluated based on its improvement on the predictive performance over fixed reservoirs. For a fixed reservoir, on the other hand, learning reduces to redoutonly learning, and hence, to adapting the parameters of a Generalized Linear Model (Mccullagh and Nelder 1989; Paninski 2004), which maps reservoir states and input to the conditional intensity. In this case, pointprocess loglikelihood is a concave function of readout parameters (see Appendix 2) and does not have nonglobal local maxima with respect to them (Paninski 2004). This shows that gradient descent in the readout parameters will asymptotically result in a global maximum. The absence of local minima holds for readout parameters under a fixed reservoir or if an adaptive reservoir is fixed after some epochs of training. Note that we take an online (stochastic) gradient descent approach in this work. Although online gradient descent takes a stochastic path in the parameter space for maximization of the loglikelihood, empirical evidence suggests that the average behavior of gradient descent is not affected by online updates (Bottou 2004).
5 Evaluation of the learned models
We compared our results with feedforward and recurrent architectures to the baseline method where only the readout weights were adapted. For different ESN types and architectures, we comparatively evaluated their capabilities on modeling BNN stimulusresponse relations by testing the predictive power of the ESN on the observed spike trains. The continuous ESN output signal was tested for compatibility with the actual observed spikes.
As the ESN output is continuous, true positive rate and false positive rate will vary with respect to a moving threshold. An example ROC curve is shown in Fig. 6 (bottom). In contrast to classification accuracy, the area under a ROC curve (AUC) is more robust with respect to prior class probabilities (Bradley 1997). An AUC around 0.5 indicates a random prediction whereas an AUC for a perfect prediction will equal 1.0.
6 Experimental results
We employed Echo State Networks with different types and sizes of reservoirs on spike prediction tasks, where the spikes were recorded from simulations of random cortical networks and from cultured networks of cortical neurons. We investigated whether ESNs successfully predict output spikes of BNNs when they are presented with the same input streams.
6.1 Simulations of random cortical networks
We simulated 10 surrogate cortical neural networks of 1000 neurons each with 0.1 random connectivity. We used the Izhikevich neuron model and included spiketiming dependent plasticity (STDP) in our simulations (Izhikevich 2006). At each 5 ms, one of the 100 input channels was randomly selected from a uniform distribution and a pulse of 5 ms width was sent to the network. Each input channel had excitatory projections to 10 % of the neurons in the network. The networks were simulated for 2 h in real time with STDP and 0.5 h in real time with frozen synapses thereafter. In each network, the spikes from a randomly selected excitatory neuron were recorded with 5 ms discretization. We then trained ESNs to estimate the conditional intensity of the selected neuron’s spiking process from the history of the input pulses.
LIESNs with three different reservoir types of various sizes were used for this task, namely recurrent fixed recurrent reservoirs (Rfxd), recurrent adaptive reservoirs (Radp) and feedforward adaptive (FFadp) reservoirs. Reservoir connectivity and each individual time constant were adapted using the method described in Section 4. For each of the 10 surrogate biological neural networks, we used reservoirs of sizes 100, 500 and 1000. A separate random sparse reservoir for each size and for each surrogate biological network was generated, where each reservoir unit was connected to 10 other units. For each random reservoir, a feedforward reservoir was generated by inverting the edges that cause recurrence as described in Fig. 4. Note that the same sized feedforward and the recurrent reservoirs for the same biological neural network had the same topology apart from edge directions. This yielded 60 different reservoirs for the whole experiment.
From each of the 10 BNN simulations we recorded data for 30 min in real time. Using a discretization time bin of 5 ms, the simulation yielded time series data for 360,000 time steps, of which 328,000 time steps for were used for training and 30,000 time steps were used for testing in each subexperiment. For the adaptive reservoirs, the training phase included 20 full adaptation iterations of the reservoirs and 60 readout adaptation iterations. One iteration covered one forward traversal of the training data in time. No learning was performed on the test data.
Areas under ROC curves (in %) for the prediction of activity in 10 simulated biological neural networks with fixed recurrent (Rfxd), feedforward adaptive (FFadp) and recurrent adaptive (Radp) reservoirs
Res. size:  100  500  1000  

Res. type:  Rfxd  FFadp  Radp  Rfxd  FFadp  Radp  Rfxd  FFadp  Radp 
BNN1  74.0  78.3  76.8  81.3  81.4  80.6  81.7  82.3  82.1 
BNN2  76.7  80.6  78.1  84.1  84.8  84.1  84.7  85.5  85.3 
BNN3  69.4  73.5  70.7  77.0  77.6  77.3  77.4  78.1  78.6 
BNN4  76.6  78.9  78.0  81.4  81.6  81.2  82.3  82.9  82.6 
BNN5  74.9  78.0  77.4  81.5  81.4  81.2  81.7  82.2  82.2 
BNN6  79.0  82.0  81.3  84.8  84.9  84.7  84.9  85.3  85.5 
BNN7  72.0  74.4  74.1  78.2  79.0  78.7  79.2  79.5  79.9 
BNN8  71.7  73.0  72.3  76.8  77.5  76.5  78.1  78.1  78.5 
BNN9  72.3  74.9  74.6  78.5  78.4  78.4  78.8  79.2  79.1 
BNN10  79.0  80.7  79.0  83.8  83.9  83.4  83.9  84.5  84.3 
mean  74.6  77.4 ∙ ∗  76.2 ∙  80.7  81.1 ∙ ∗  80.6  81.3  81.8 ∙  81.8 ∙ 
We further investigated whether the predictive power of ESNs with feedforward adaptive reservoirs of leaky integrator neurons is comparable to that of recurrent reservoirs. Adaptive reservoirs are in general better than nonadaptive reservoirs. This difference, however, vanishes with increasing reservoir size (Table 1). In these experiments, feedforward adaptive reservoirs significantly outperformed the nonadaptive recurrent reservoirs of all sizes. Furthermore, they significantly outperformed recurrent adaptive reservoirs for sizes of 100 and 500. Although adaptation brought significant improvement also to recurrent reservoirs for sizes 100 and 1000, feedforward adaptation performed in general better than recurrent adaptation in estimating the conditional intensity. Small reservoirs (i.e. for N = 100) of the feedforward architecture were drastically superior to the recurrent reservoirs.
6.1.1 Approximating gradient vectors with onestep propagation
6.1.2 Information encoded in reservoir activity
6.2 Prediction of spontaneous events in neuronal cultures
To test our approach on living neural networks, we aimed at predicting spontaneous events in neuronal cultures. We defined an event as a group of spikes recorded by the same MEA electrode, whose maximum inter spike interval is less than 60 ms. The time of the first spike in the group is regarded as the event time. It should be noted again that the activity in neuronal cultures is typically a sequence of network bursts. If there were at least 100 ms between two successive events, we regarded them as parts of different bursts. These criteria mostly excluded isolated spikes from network bursts and clustered fast repetitive firings at an electrode, e.g. a cellular burst, into a single event. We recorded spikes from a neuronal cell culture for 30 min, detecting bursts and events according to the above definition. We used the data for training except for the last 200 s, which were reserved for testing. We selected approximately 3/4th of the 60 MEA electrodes and treated their events as the input stream; and the remaining electrodes as output. This selection was based on spatial order of the electrodes, i.e. input and output electrodes occupied two distinct areas in the culture. If an electrode had never recorded spikes in the training event train, it was regarded as inactive and was excluded from the learning and testing processes. The evaluation task was to estimate the conditional intensity of the output events for each output electrode at time step n using the total input event stream until time step n (1 ms bin size). Note that only very few events occur outside the bursts. Therefore, the algorithm performs learning and prediction only during bursts.
Based on estimates for all time steps during network bursts in the last 200 s, we evaluated the predictive performance of the learned ESN models using ROC curve analysis on the target events and estimated conditional intensity. We selected the electrodes that recorded at least 15 events within the evaluation window of 200 s.
Areas under ROC curves for the event timing prediction task aggregated over active output electrodes
Res. size  10  30  50  100  300  500  1000 




Without history  
Rfxd  59.9  57.9  74.3  78.9  78.3  82.7  83.2  
FFadp  59.3  64.1  72.5  76.3  83.8  82.3  83.9  
Radp  63.9  68.9  69.2  76.2  82.3  85.8  84.2  
With history  
Rfxd  57.2  62.4  69.9  76.7  79.8  82.5  82.1  
FFadp  57.3  66.4  69.6  76.6  80.1  82.1  83.4  
Radp  59.8  68.1  74.3  79.7  79.4  85.1  83.6  
Rate based prediction  
Kernel(ms):  3  5  10  20  30  50  70  100  150  250 
AUC:  60.7  64.0  69.3  75.4  73.9  71.3  69.9  66.5  61.2  54.6 
Mean AUC (%) in the event timing prediction task in 3 different cultures
Channel  Culture 1  Culture 2  Culture 3  

Rfxd  FFadp  Radp  Rfxd  FFadp  Radp  Rfxd  FFadp  Radp  
All  81.6  83.1  83.6  82.7  82.6  82.2  66.7  66.4  67.1 
Best  84.2  87.0  89.3  92.3  91.9  92.2  83.8  84.7  82.2 
Worst  78.8  77.8  76.6  75.6  73.9  74.2  50.4  48.3  53.8 
Pairwise comparison of the reservoir types and methods in the event timing prediction task in 3 different cultures using 10 random reservoirs all with size 500
 Culture 1  Culture 2  Culture 3  

# output el.: 6  # output el.: 10  # output el.: 9  
Rfxd  FFadp  Radp  Rfxd  FFadp  Radp  Rfxd  FFadp  Radp  
Rfxd  0  0  1  0  1  3  0  2  1 
FFadp  2  0  0  1  0  2  1  0  2 
Radp  3  2  0  1  0  0  2  3  0 
6.3 Nextevent prediction in neuronal cultures
Recent findings indicate that the temporal order of the events in neuronal cultures carries significant information about the type of stimuli (Shahaf et al. 2008). In this experimental setting, we investigated whether the ESNs can model the structure in the temporal order of the events. We deleted timing information of the events from the previous setting and, as a result, we obtained data of temporally ordered events. More precisely, we used operational time steps for ESN analysis, i.e. each event appeared only in 1 time step and each time step contained only 1 event.
AUC (%) for the nextevent prediction task aggregated over all active output electrodes in a selected neuronal culture (1)
Res. size  10  30  50  100  300  500  1000 

Rfxd  66.2  71.6  73.7  79.4  82.1  82.9  83.3 
FFadp  71.3  77.9  79.8  81.5  82.5  83.3  83.1 
Radp  71.8  72.7  79.4  81.2  82.3  82.4  82.3 
Mean AUC (%) in the nextevent prediction task in 3 different cultures
Channel  Culture 1  Culture 2  Culture 3  

Rfxd  FFadp  Radp  Rfxd  FFadp  Radp  Rfxd  FFadp  Radp  
All  81.9  82.6  82.2  75.2  75.7  75.4  72.7  73.1  72.8 
Best  93.1  93.4  93.2  88  89.1  88.5  82  82.3  81.8 
Worst  54.7  56.3  55.2  62.4  60.4  62.5  50.5  54.3  54.2 
Pairwise comparison of the reservoir types and methods in the nextevent prediction task in 3 different cultures using 10 random reservoirs all with size 300
 Culture 1  Culture 2  Culture 3  

# output el.: 26  # output el.: 34  # output el.: 38  
Rfxd  FFadp  Radp  Rfxd  FFadp  Radp  Rfxd  FFadp  Radp  
Rfxd  0  5  5  0  6  5  0  6  11 
FFadp  11  0  8  22  0  20  12  0  17 
Radp  7  1  0  6  1  0  7  0  0 
7 Conclusion
The implications of our results are manifold. Firstly, our results indicate that reservoir computing is a potential candidate for modeling neural activity including neural encoding and decoding. With their expressive power for different activity measures (e.g. spike rates, correlations etc.), reservoir computing tools might help for analysis of neural data. In our experiments, ESNs of leaky integrator neurons proved successful for modeling responsedynamics (e.g. stimulusresponse relations and spatiotemporal dynamics) of simulated and cultured biological neural neural networks.
On the methodological side, we showed that ESN learning algorithms can be adapted for event data, such as spikes or spike groups, using a point process framework. We proposed a reservoir adaptation method for event data, which can be used to adapt connectivity and time constants of the reservoir neurons. The experimental results indicate that reservoir adaptation can significantly improve the ESN performance over readoutonly training.
We utilized feedforward networks with leakyintegrator neurons as reservoirs with a comparable predictive power to recurrent reservoirs when trained with loglikelihood propagation. In modeling stimulusresponse relations of simulated BNNs, feedforward reservoir adaptation outperformed other methods up to 500 neurons. This outperformance was statistically significant. For the event timing prediction task in neuronal cultures, adaptive recurrent reservoir adaptation outperformed the other methods (in 2 of 3 cultures), whereas feedforward adaptation were better in the nextevent type prediction task in all 3 cultures. This might indicate that the type of encoding in neural systems (order or timing) favors different architectures for decoding. An analysis of the structurecoding relations, however, is beyond the scope of this note. Why feedforward reservoir adaptation can work better than recurrent reservoir adaptation necessitates also more theoretical analysis. We manually optimized global parameters (spectral radius, learning rates and A) for recurrent fixed reservoirs. Recurrent adaptive reservoirs started from these values. Although we can think of no obvious disadvantage for recurrent reservoirs, feedforward adaptation outperformed recurrent adaptation in some tasks. We experimentally showed that onestep propagation approximates the gradients better in feedforward reservoirs than in recurrent ones. In our opinion, better structuring of the reservoir parameters, i.e. separation of the memory parameters from the reservoir connectivity, might further underlie the relative better performance of feedforward adaptation. For instance, a small connectivity change in the recurrent adaptation might have a more dramatic effect on the reservoir memory than in case of feedforward adaptation. Our findings on feedforward networks are also in accordance with the recent work by Ganguli et al. (2008), Goldman (2009), Murphy and Miller (2009), who show that stable fading memory can be realized by feedforward or functionally feedforward networks and that feedforward networks have desirable properties in terms of tradeoff between noise amplification and memory capacity.
Notes
Acknowledgements
This work was supported in part by the German BMBF (01GQ0420), ECNEURO (12788) and BFNT (01GQ0830) grants. The authors would like to thank Luc De Raedt, Herbert Jaeger and anonymous reviewers for fruitful discussions and helpful feedback. We also appreciate the help of Steffen Kandler, Samorah Okujeni and Oliver Weihberger with culture preparations and recordings.
Open Access
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
References
 Abeles, M., Bergman, H., Margalit, E., & Vaadia, E. (1993). Spatiotemporal firing patterns in the frontal cortex of behaving monkeys. Journal of Neurophysiology, 70, 1629–1638.PubMedGoogle Scholar
 Auer, P., Burgsteiner, H., & Maass, W. (2008). A learning rule for very simple universal approximators consisting of a single layer of perceptrons. Neural Networks, 21(5), 786–795.CrossRefPubMedGoogle Scholar
 Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning longterm dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 57–166.CrossRefGoogle Scholar
 Bottou, L. (2004). Stochastic learning. In O. Bousquet & U. von Luxburg (Eds.), Advanced lectures on machine learning. Lecture notes in artificial intelligence, LNAI (Vol. 3176, pp. 146–168). Berlin: Springer Verlag.CrossRefGoogle Scholar
 Bradley, A. P. (1997). The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), 1145–1159.CrossRefGoogle Scholar
 Brown, E. N., Barbieri, R., Eden, U. T., & Frank, L. M. (2003). Likelihood methods for neural spike train data analysis. In J. Feng (Ed.), Computational neuroscience: A comprehensive approach (chapter 9). London: CRC.Google Scholar
 Brown, E. N., Nguyen, D. P., Frank, L. M., Wilson, M. A., & Solo, V. (2001). An analysis of neural receptive field plasticity by point process adaptive filtering. PNAS, 98(21), 12261–12266.CrossRefPubMedGoogle Scholar
 Brown, R. E., & Milner, P. M. (2003). The legacy of Donald O. Hebb: More than the Hebb synapse. Nature Reviews Neuroscience, 4(12), 1013–1019.CrossRefPubMedGoogle Scholar
 Chao, Z. C., Bakkum, D. J., & Potter, S. M. (2008). Shaping embodied neural networks for adaptive goaldirected behavior. PLoS Computational Biology, 4(3), e1000042.CrossRefGoogle Scholar
 Chatfield, C., & Collins, A. J. (2000). Introduction to multivariate analysis, reprint. Boca Raton: CRC.Google Scholar
 Chornoboy, E. S., Schramm, L. P., & Karr, A. (1988). Maximum likelihood identification of neuronal point process systems. Biological Cybernetics, 59(9), 265–275.CrossRefPubMedGoogle Scholar
 Cox, D. R., & Isham, V. (1980). Point processes. In CRC monographs on statistics & applied probability. London: Chapman & Hall/CRC.Google Scholar
 Daley, D. J., & VereJones D. (2003). An introduction to the theory of point processes (2nd ed., Vol. 1). New York: Springer.Google Scholar
 Davison, A. C. (2003). Statistical models. Cambridge: Cambridge University Press.Google Scholar
 Egert, U., Knott, T., Schwarz, C., Nawrot, M., Brandt, A., Rotter, S., et al. (2002). Meatools: An open source toolbox for the analysis of multielectrodedata with matlab. Journal of Neuroscience Methods, 177, 33–42.CrossRefGoogle Scholar
 Feber le, J., Rutten, W. L. C., Stegenga, J., Wolters, P. S., Ramakers, G. J. A., & Pelt van, J. (2007). Conditional firing probabilities in cultured neuronal networks: A stable underlying structure in widely varying spontaneous activity patterns. Journal of Neural Engineering, 4(2), 54–67.CrossRefGoogle Scholar
 Ganguli, S., Huh, D., & Sompolinsky, H. (2008). Memory traces in dynamical systems. Proceedings of the National Academy of Sciences of the United States of America, 105(48), 18970–18975.CrossRefPubMedGoogle Scholar
 Goldman, M. S. (2009). Memory without feedback in a neural network. Neuron, 61(4), 621–634.CrossRefPubMedGoogle Scholar
 Hebb, D. O. (1949). The organization of behavior: A neuropsychological theory. New York: Wiley.Google Scholar
 Hodgkin, A. L., & Huxley, A. F. (1952). A quantitative description of membrane current and its application to conduction and excitation in nerve. Journal of Physiology (London), 117, 500–544.Google Scholar
 Izhikevich, E. M. (2006). Polychronization: Computation with spikes. Neural Computation, 18(2), 245–282.CrossRefPubMedGoogle Scholar
 Jaeger, H. (2001). The ”echo state” approach to analysing and training recurrent neural networks. GMD report 148, GMD  German National Research Institute for Computer Science.Google Scholar
 Jaeger, H. (2003). Adaptive nonlinear system identification with echo state networks. In S. T. S. Becker & K. Obermayer (Eds.), Advances in neural information processing systems (Vol. 15, pp. 593–600). Cambridge: MIT.Google Scholar
 Jaeger, H., & Eck, D. (2006). Can’t get you out of my head: A connectionist model of cyclic rehearsal. In I. Wachsmuth & G. Knoblich (Eds.), ZiF workshop. Lecture notes in computer science (Vol. 4930, pp. 310–335. New York: Springer.Google Scholar
 Jaeger, H., & Haas, H. (2004). Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science, 304(5667), 78–80.CrossRefPubMedGoogle Scholar
 Jaeger, H., Lukosevicius, M., Popovici, D., & Siewert, U. (2007). Optimization and applications of echo state networks with leaky integrator neurons. Neural Networks, 20(3), 335–352.CrossRefPubMedGoogle Scholar
 Jimbo, Y., Tateno, T., & Robinson, H. P. C. (1999). Simultaneous induction of pathwayspecific potentiation and depression in networks of cortical neurons. Biophysical Journal, 76(2), 670–678.CrossRefPubMedGoogle Scholar
 Kandel, E. R., Schwartz, J. H., & Jessell, T. M. (2000). Principles of neural science. London: McGraw HillGoogle Scholar
 Lindsey, B. G., Morris, K. F., Shannon, R., & Gerstein, G. L. (1997). Repeated patterns of distributed synchrony in neuronal assemblies. Journal of Neurophysiology, 78(3), 1714–1719.PubMedGoogle Scholar
 Maass, W., Joshi, P., & Sontag, E. D. (2007). Computational aspects of feedback in neural circuits. PLoS Computational Biology, 3(1), e165.CrossRefGoogle Scholar
 Maass, W., Natschläger, T., & Markram, H. (2002). Realtime computing without stable states: A new framework for neural computation based on perturbations. Neural Computation, 14(11), 2531–2560.CrossRefPubMedGoogle Scholar
 Maass, W., Natschläger, T., & Markram, H. (2004). Fading memory and kernel properties of generic cortical microcircuit models. Journal of Physiology—Paris, 98(4–6), 315–330.Google Scholar
 Mainen, Z. F., & Sejnowski, T. J. (1995). Reliability of spike timing in neocortical neurons. Science, 268, 1503–1506.CrossRefPubMedGoogle Scholar
 Mannila, H., Toivonen, H., & Verkamo, A. I. (1997). Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3), 259–289.CrossRefGoogle Scholar
 Marom, S., & Shahaf, G. (2002). Development, learning and memory in large random networks of cortical neurons: Lessons beyond anatomy. Quarterly Reviews of Biophysics, 35(1), 63–87.CrossRefPubMedGoogle Scholar
 Mccullagh, P., & Nelder, J. (1989). Generalized linear models (2nd ed.). London: Chapman & Hall/CRC.Google Scholar
 Murphy, B. K., & Miller, K. D. (2009). Balanced amplification: A new mechanism of selective amplification of neural activity patterns. Neuron, 61(4), 635–648.CrossRefPubMedGoogle Scholar
 Okatan, M., Wilson, M. A., & Brown, E. N. (2005). Analyzing functional connectivity using a network likelihood model of ensemble neural spiking activity. Neural Computation, 17(9), 1927–1961.CrossRefPubMedGoogle Scholar
 Paninski, L. (2004). Maximum likelihood estimation of cascade pointprocess neural encoding models. Network: Computation in Neural Systems, 15(4), 243–262.CrossRefGoogle Scholar
 Pawitan, Y. (2001). In all likelihood: Statistical modelling and inference using likelihood. New York: Oxford University Press.Google Scholar
 Prut, Y., Vaadia, E., Bergman, H., Haalman, I., Slovin, H., & Abeles, M. (1998). Spatiotemporal structure of cortical activity: Properties and behavioral relevance. Journal of Neurophysiology, 79(6), 2857–2874.PubMedGoogle Scholar
 Rajaram, S., Graepel, T., & Herbrich, R. (2005). Poissonnetworks: A model for structured point processes. In Proceedings of the AI STATS 2005 workshop.Google Scholar
 Riehle, A., Grün, S., Diesmann, M., & Aertsen, A. (1997). Spike synchronization and rate modulation differentially involved in motor cortical function. Science, 278, 1950–1953.CrossRefPubMedGoogle Scholar
 Rieke, F., Warland, D., van Steveninck, R. d. R., & Bialek, W. (1999). Spikes: Exploring the neural code. In Computational neuroscience. Cambridge: MIT.Google Scholar
 Rolston, J. D., Wagenaar, D. A., & Potter, S. M. (2007). Precisely timed spatiotemporal patterns of neural activity in dissociated cortical cultures. Neuroscience, 148(1), 294–303.CrossRefPubMedGoogle Scholar
 Ruaro, M. E., Bonifazi, P., & Torre, V. (2005). Toward the neurocomputer: Image processing and pattern recognition with neuronal cultures. IEEE Transactions on Biomedical Engineering, 52(3), 371–183.CrossRefPubMedGoogle Scholar
 Shahaf, G., Eytan, D., Gal, A., Kermany, E., Lyakhov, V., Zrenner, C., et al. (2008). Orderbased representation in random networks of cortical neurons. PLoS Computational Biology, 4(11), e1000228.CrossRefGoogle Scholar
 Shahaf, G., & Marom, S. (2001). Learning in networks of cortical neurons. Journal of Neuroscience, 21(22), 8782–8788.PubMedGoogle Scholar
 Steil, J. (2004). Backpropagationdecorrelation: Online recurrent learning with o(n) complexity. In Proceedings of the IJCNN (Vol. 1, pp. 843–848).Google Scholar
 Steil, J. (2007). Online reservoir adaptation by intrinsic plasticity for backpropagationdecorrelation and echo state learning. Neural networks: The Official Journal of the International Neural Network Society, 20(3), 353–364.Google Scholar
 Villa, A. E. P., Tetko, I. V., Hyland, B., & Najem, A. (1999). Spatiotemporal activity patterns of rat cortical neurons predict responses in a conditioned task. Proceedings of the National Academy of Sciences, 96(3), 1106–1111.CrossRefGoogle Scholar
 Wagenaar, D. A., DeMarse, T. B., & Potter, S. M. (2005). Meabench: A toolset for multielectrode data acquisition and online analysis. In Proc. 2nd int. IEEE EMBS conference on neural engineering.Google Scholar
 Williams, R. J., & Zipser, D. (1989). A learning algorithm for continually running fully recurrent neural networks. Neural Computation, 1(2), 270–280.CrossRefGoogle Scholar