Journal of Computational Neuroscience

, Volume 29, Issue 1–2, pp 171–182

Kernel bandwidth optimization in spike rate estimation

Open Access


Kernel smoother and a time-histogram are classical tools for estimating an instantaneous rate of spike occurrences. We recently established a method for selecting the bin width of the time-histogram, based on the principle of minimizing the mean integrated square error (MISE) between the estimated rate and unknown underlying rate. Here we apply the same optimization principle to the kernel density estimation in selecting the width or “bandwidth” of the kernel, and further extend the algorithm to allow a variable bandwidth, in conformity with data. The variable kernel has the potential to accurately grasp non-stationary phenomena, such as abrupt changes in the firing rate, which we often encounter in neuroscience. In order to avoid possible overfitting that may take place due to excessive freedom, we introduced a stiffness constant for bandwidth variability. Our method automatically adjusts the stiffness constant, thereby adapting to the entire set of spike data. It is revealed that the classical kernel smoother may exhibit goodness-of-fit comparable to, or even better than, that of modern sophisticated rate estimation methods, provided that the bandwidth is selected properly for a given set of spike data, according to the optimization methods presented here.


Kernel density estimation Bandwidth optimization Mean integrated squared error 

1 Introduction

Neurophysiologists often investigate responses of a single neuron to a stimulus presented to an animal by using the discharge rate of action potentials, or spikes (Adrian 1928; Gerstein and Kiang 1960; Abeles 1982). One classical method for estimating spike rate is the kernel density estimation (Parzen 1962; Rosenblatt 1956; Sanderson 1980; Richmond et al. 1990; Nawrot et al. 1999). In this method, a spike sequence is convoluted with a kernel function, such as a Gauss density function, to obtain a smooth estimate of the firing rate. The estimated rate is sometimes referred to as a spike density function. This nonparametric method is left with a free parameter for kernel bandwidth that determines the goodness-of-fit of the density estimate to the unknown rate underlying data. Although theories have been suggested for selecting the bandwidth, cross-validating with the data (Rudemo 1982; Bowman 1984; Silverman 1986; Scott and Terrell 1987; Scott 1992; Jones et al. 1996; Loader 1999a, b), individual researchers have mostly chosen bandwidth arbitrarily. This is partly because the theories have not spread to the neurophysiological society, and partly due to inappropriate basic assumptions of the theories themselves. Most optimization methods assume a stationary rate fluctuation, while the neuronal firing rate often exhibits abrupt changes, to which neurophysiologists, in particular, pay attention. A fixed bandwidth, optimized using a stationary assumption, is too wide to extract the details of sharp activation, while in the silent period, the fixed bandwidth would be too narrow and may cause spurious undulation in the estimated rate. It is therefore desirable to allow a variable bandwidth, in conformity with data.

The idea of optimizing bandwidth at every instant was proposed by Loftsgaarden and Quesenberry (1965). However, in contrast to the progress in methods that vary bandwidths at sample points only (Abramson 1982; Breiman et al. 1977; Sain and Scott 1996; Sain 2002; Brewer 2004), the local optimization of bandwidth at every instant turned out to be difficult because of its excessive freedom (Scott 1992; Devroye and Lugosi 2000; Sain and Scott 2002). In earlier studies, Hall & Schucany used the cross-validation of Rudemo and Bowman, within local intervals (Hall and Schucany 1989), yet the interval length was left free. Fan et al. applied cross-validation to locally optimized bandwidth (Fan et al. 1996), and yet the smoothness of the variable bandwidth was chosen manually.

In this study, we first revisit the fixed kernel method, and derive a simple formula to select the bandwidth of the kernel density estimation, similar to the previous method for selecting the bin width of a peristimulus time histogram (See Shimazaki and Shinomoto 2007). Next, we introduce the variable bandwidth into the kernel method and derive an algorithm for determining the bandwidth locally in time. The method automatically adjusts the flexibility, or the stiffness, of the variable bandwidth. The performance of our fixed and variable kernel methods are compared with established density estimation methods, in terms of the goodness-of-fit to underlying rates that vary either continuously or discontinuously. We also apply our kernel methods to the biological data, and examine their ability by cross-validating with data.

Though our methods are based on the classical kernel method, their performances are comparable to various sophisticated rate estimation methods. Because of the classics, they are rather convenient for users: the methods simply suggest bandwidth for the standard kernel density estimation.

2 Methods

2.1 Kernel smoothing

In neurophysiological experiments, neuronal response is examined by repeatedly applying identical stimuli. The recorded spike trains are aligned at the onset of stimuli, and superimposed to form a raw density, as
$$ x_{t}=\frac{1}{n} \sum_{i=1}^{N}{{\delta\left( {t-t}_{i}\right) }}, \label{rawdensity} $$
where n is the number of repeated trials. Here, each spike is regarded as a point event that occurs at an instant of time ti (i = 1,2, ⋯ , N) and is represented by the Dirac delta function δ(t). The kernel density estimate is obtained by convoluting a kernel k(s) to the raw density xt,
$$ \hat{\lambda}_{t}=\int x_{t-s} k{\left( {s}\right) }\,ds. \label{kerneldensity} $$
Throughout this study, the integral \(\int\) that does not specify bounds refers to \(\int_{-\infty}^{\infty}\). The kernel function satisfies the normalization condition, \(\int k(s) \,ds=1\), a zero first moment, \(\int s k(s) \, ds=0\), and has a finite bandwidth, \(w^2 = \int s^2 k(s) \,ds < \infty\). A frequently used kernel is the Gauss density function,
$$ k_{w}(s) = \frac{1}{\sqrt{2 \pi }w} \exp{\left( -\frac{s^2}{2 w^2} \right)}, \label{gaussdensity} $$
where the bandwidth w is specified as a subscript. In the body of this study, we develop optimization methods that apply generally to any kernel function, and derive a specific algorithm for the Gauss density function in the Appendix.

2.2 Mean integrated squared error optimization principle

Assuming that spikes are sampled from a stochastic process, we consider optimizing the estimate \(\hat{\lambda}_{t}\) to be closest to the unknown underlying rate λt. Among several plausible optimizing principles, such as the Kullback-Leibler divergence or the Hellinger distance, we adopt, here, the mean integrated squared error (MISE) for measuring the goodness-of-fit of an estimate to the unknown underlying rate, as
$$ \mbox{MISE}=\int_{a}^{b}E(\hat{\lambda}_{t}-\lambda_{t})^{2} \,dt, \label{eq:MISE} $$
where E refers to the expectation with respect to the spike generation process under a given inhomogeneous rate λt. It follows, by definition, that Ext = λt.
In deriving optimization methods, we assume the Poisson nature, so that spikes are randomly sampled at a given rate λt. Spikes recorded from a single neuron correlate in each sequence (Shinomoto et al. 2003, 2005, 2009). In the limit of a large number of spike trains, however, mixed spikes are statistically independent and the superimposed sequence can be approximated as a single inhomogeneous Poisson point process (Cox 1962; Snyder 1975; Daley and Vere-Jones 1988; Kass et al. 2005).

2.3 Selection of the fixed bandwidth

Given a kernel function such as Eq. (3), the density function Eq. (2) is uniquely determined for a raw density Eq. (1) of spikes obtained from an experiment. A bandwidth w of the kernel may alter the density estimate, and it can accordingly affect the goodness-of-fit of the density function \(\hat{\lambda}_{t}\) to the unknown underlying rate λt. In this subsection, we consider applying a kernel of a fixed bandwidth w, and develop a method for selecting w that minimizes the MISE, Eq. (4).

The integrand of the MISE is decomposed into three parts: \(E\hat{\lambda}_{t}^{2}-2\lambda_{t}E\hat{\lambda}_{t}+\lambda_{t}^{2}\). Since the last component does not depend on the choice of a kernel, we subtract it from the MISE, then define a cost function as a function of the bandwidth w:
$$\begin{array}{rcl} C_{n}\left( w\right) & = & \mbox{MISE}-\int_{a}^{b}\lambda_{t} ^{2}\,dt \\ & =& \int_{a}^{b}E\hat{\lambda}_{t}^{2}\,dt-2\int_{a}^{b}\lambda_{t} E\hat{\lambda}_{t}\,dt.\label{eq:Cost_Function1} \end{array} $$
Rudemo and Bowman suggested the leave-one-out cross-validation to estimate the second term of Eq. (5) (Rudemo 1982; Bowman 1984). Here, we directly estimate the second term with the Poisson assumption (See also Shimazaki and Shinomoto 2007).
By noting that λt = Ext, the integrand of the second term in Eq. (5) is given as
$$ E x_{t} E \hat{\lambda}_{t} = E\big[x_{t}\hat{\lambda}_{t}\big] -E\big[(x_{t}-Ex_{t})\big(\hat{\lambda}_{t}-E\hat{\lambda}_{t}\big)\big], \label{correlation_decomposition} $$
from a general decomposition of covariance of two random variables. Using Eq. (2), the covariance (the second term of Eq. (6)) is obtained as
$$\begin{array}{rcl} &&{\kern-6pt} E\big[(x_{t}-Ex_{t})\big(\hat{\lambda}_{t}-E\hat{\lambda} _{t}\big)\big] \\ &&{\kern12pt} = \int k_w \left( {t-s}\right) E\left[ (x_{t}-Ex_{t}) \left( x_{s}-Ex_{s}\right) \right] \,ds \\ &&{\kern12pt} = \int{k}_{w}{\left( {t-s}\right) }\left[ \delta\left( t-s\right) \frac{1}{n}Ex_{s}\right] \,ds \\ &&{\kern12pt} =\frac{{1}}{n} {k}_{w}(0)Ex_{t}.\label{eq:covariance} \end{array} $$
Here, to obtain the second equality, we used the assumption of the Poisson point process (independent spikes).
Using Eqs. (6) and (7), Eq. (5) becomes
$$\begin{array}{rcl} C_{n}\left( w\right) &=&\int_{a}^{b}E\hat{\lambda}_{t}^{2}\,dt \\ && -2\int_{a}^{b}\left\{E\big[x_{t}\hat{\lambda}_{t}\big]-\frac{1}{n}k_{w}(0)Ex_{t}\right\} \,dt. \label{eq:Cost_Function_Fix2} \end{array} $$
Equation (8) is composed of observable variables only. Hence, from sample sequences, the cost function is estimated as
$$ \hat{C}_{n}\left( w\right) =\int_{a}^{b}\hat{\lambda}_{t}^{2}\,dt-2\int _{a}^{b}\left\{x_{t}\hat{\lambda}_{t}-\frac{1}{n}k_{w}(0)x_{t}\right\} \,dt. \label{eq:Cost_Function_Fix_Estimated1} $$
In terms of a kernel function, the cost function is written as
$$\begin{array}{rcl} \hat{C}_{n}\left( w\right) &=& \frac{1}{n^{2}}\sum\limits_{i,j}\psi_{w}\left( t_{i},t_{j}\right) \\ && -\frac{2}{n^{2}}\left\{\sum\limits_{i,j}k_{w}\left( t_{i}-t_{j}\right) -k_{w} (0)N\,\right\} \\ &=& \frac{1}{n^2} \sum\limits_{i,j}\psi_{w}\left( t_{i},t_{j}\right) -\frac{2}{n^2} \sum\limits_{i\neq j}k_{w}\left( t_{i}-t_{j}\right), \label{eq:Cost_Function_Fix_Estimated} \end{array} $$
$$ \psi_{w}\left( t_{i},t_{j}\right) =\int_{a}^{b}{k}_{w}{\left( t{-} t_{i}\right) k}_{w}{\left( t{-}t_{j}\right) }\,dt.\label{eq:phai_fix} $$
The minimizer of the cost function, Eq. (10), is an estimate of the optimal bandwidth, which is denoted by w ∗ . The method for selecting a fixed kernel bandwidth is summarized in Algorithm 1. A particular algorithm developed for the Gauss density function is given in the Appendix.

2.4 Selection of the variable bandwidth

The method described in Section 2.3 aims to select a single bandwidth that optimizes the goodness-of-fit of the rate estimate for an entire observation interval [a, b] . For a non-stationary case, in which the degree of rate fluctuation greatly varies in time, the rate estimation may be improved by using a kernel function whose bandwidth is adaptively selected in conformity with data. The spike rate estimated with the variable bandwidth wt is given by
$$ \hat{\lambda}_{t}=\int x_{t-s} k_{w_{t}}\left( s \right)\,ds. $$
Here we select the variable bandwidth wt as a fixed bandwidth optimized in a local interval. In this approach, the interval length for the local optimization regulates the shape of the function wt, therefore, it subsequently determines the goodness-of-fit of the estimated rate to the underlying rate. We provide a method for obtaining the variable bandwidth wt that minimizes the MISE by optimizing the local interval length.
To select an interval length for local optimization, we introduce the local MISE criterion at time t as
$$ \mbox{{\em local}MISE}=\int E\left(\hat{\lambda}_{u} - \lambda_{u}\right)^{2}\rho_{W} ^{u-t}du, $$
where \(\hat{\lambda}_{u} = \int x_{u-s} k_{w}(s) \,ds\) is an estimated rate with a fixed bandwidth w. Here, a weight function \(\rho _{W}^{u-t}\) localizes the integration of the squared error in a characteristic interval W centered at time t. An example of the weight function is once again the Gauss density function. See the Appendix for the specific algorithm for the Gauss weight function. As in Eq. (5), we introduce the local cost function at time t by subtracting the term irrelevant for the choice of w as
$$ C_{n}^{t}\left( w,W\right) =\mbox{{\em local}MISE}-\int\lambda_{u} ^{2}\rho\,_{W}^{u-t} \,du. $$
The optimal fixed bandwidth w ∗  is obtained as a minimizer of the estimated cost function:
$$ \begin{array}{rcl} \hat{C}_{n}^t\left( w,W\right) &=&\frac{1}{n^2}\sum\limits_{i,j}\psi_{w,W}^{t}\left( t_{i} ,t_{j}\right) \\ && -\frac{2}{n^2} \sum\limits_{i\neq j}k_{w}\left( t_{i}-t_{j}\right) \rho _{W}^{t_{i}-t}, \label{eq:CostFunction_Local_Estimated} \end{array} $$
$$ \psi_{w,W}^{t}\left( t_{i},t_{j}\right) =\int k_{w}\left( u-t_{i}\right) k_{w}\left( u-t_{j}\right) \rho_{W}^{u-t}du.\label{eq:phai_local} $$
The derivation follows the same steps as in the previous section. Depending on the interval length W, the optimal bandwidth w ∗  varies. We suggest selecting an interval length that scales with the optimal bandwidth as γ − 1w ∗ . The parameter γ regulates the interval length for local optimization: With small γ( ≪ 1) , the fixed bandwidth is optimized within a long interval; With large γ(~1) , the fixed bandwidth is optimized within a short interval. The interval length and fixed bandwidth, selected at time t, are denoted as \(W_{t}^{\gamma}\) and \(\bar{w}_{t}^{\gamma}\).
The locally optimized bandwidth \(\bar{w}_{t}^{\gamma}\) is repeatedly obtained for different t( ∈ [a,b]). Because the intervals overlap, we adopt the Nadaraya-Watson kernel regression (Nadaraya 1964; Watson 1964) of \(\bar {w}_{t}^{\gamma}\) as a local bandwidth at time t:
$$ w^\gamma_{t}=\left. \int\rho_{W_{s}^\gamma}^{t-s}\bar{w}_{s}^\gamma \,ds\right/ \int\rho_{W_{s}^\gamma}^{t-s}\,ds. \label{eq:Nadaraya-Watson} $$
The variable bandwidth \(w_{t}^{\gamma}\) obtained from the same data, but with different γ, exhibits different degrees of smoothness: With small γ( ≪ 1) , the variable bandwidth fluctuates slightly; With large γ(~1) , the variable bandwidth fluctuates significantly. The parameter γ is thus a smoothing parameter for the variable bandwidth. Similar to the fixed bandwidth, the goodness-of-fit of the variable bandwidth can be estimated from the data. The cost function for the variable bandwidth selected with γ is obtained as
$$ \hat{C_n}\left( \gamma \right) =\int_{a}^{b}\hat{\lambda}_{t}^{2} \,dt- \frac{2}{n^2}\sum_{i\neq j}k_{w_{t_i}^\gamma}\left( t_{i}-t_{j}\right) ,\label{eq:CostFunction_Variable_Estimated} $$
where \(\hat{\lambda}_{t}=\int x_{t-s} k_{w_t^\gamma}\left( s\right)ds\) is an estimated rate, with the variable bandwidth \(w_t^\gamma\). The integral is calculated numerically. With the stiffness constant γ ∗  that minimizes Eq. (18), local optimization is performed in an ideal interval length. The method for optimizing the variable kernel bandwidth is summarized in Algorithm 2.

3 Results

3.1 Comparison of the fixed and variable kernel methods

By using spikes sampled from an inhomogeneous Poisson point process, we examined the efficiency of the kernel methods in estimating the underlying rate. We also used a sequence obtained by superimposing ten non-Poissonian (gamma) sequences (Shimokawa and Shinomoto 2009), but there was practically no significant difference in the rate estimation from the Poissonian sequence.

Figure 1 displays the result of the fixed kernel method based on the Gauss density function. The kernel bandwidth selected by Algorithm 1 applies a reasonable filtering to the set of spike sequences. Figure 1(d) shows that a cost function, Eq. (10), estimated from the spike data is similar to the original MISE, Eq. (4), which was computed using the knowledge of the underlying rate. This demonstrates that MISE optimization can, in practice, be carried out by our method, even without knowing the underlying rate.

Figure 2(a) demonstrates how the rate estimation is altered by replacing the fixed kernel method with the variable kernel method (Algorithm 2), for identical spike data (Fig. 1(b)). The Gauss weight function is used to obtain a smooth variable bandwidth. The manner in which the optimized bandwidth varies in the time axis is shown in Fig. 2(b): the bandwidth is short in a moment of sharp activation, and is long in the period of smooth rate modulation. Eventually, the sharp activation is grasped minutely and slow modulation is expressed without spurious fluctuations. The stiffness constant γ for the bandwidth variation is selected by minimizing the cost function, as shown in Fig. 2(c).
Fig. 1

Fixed kernel density estimation. (a) The underlying spike rate λt of the Poisson point process. (b) 20 spike sequences sampled from the underlying rate, and (c) Kernel rate estimates made with three types of bandwidth: too small, optimal, and too large. The gray area indicates the underlying spike rate. (d) The cost function for bandwidth w. Solid line is the estimated cost function, Eq. (10), computed from the spike data; The dashed line is the exact cost function, Eq. (5), directly computed by using the known underlying rate

Fig. 2

Variable kernel density estimation. (a) Kernel rate estimates. The solid and dashed lines are rate estimates made by the variable and fixed kernel methods for the spike data of Fig. 1(b). The gray area is the underlying rate. (b) Optimized bandwidths. The solid line is the variable bandwidth determined with the optimized stiffness constant γ ∗  = 0.8, selected by Algorithm 2; the dashed line is the fixed bandwidth selected by Algorithm 1. (b) The cost function for bandwidth stiffness constant. The solid line is the cost function for the bandwidth stiffness constant γ, Eq. (18), estimated from the spike data; the dashed line is the cost function computed from the known underlying rate

3.2 Comparison with established density estimation methods

We wish to examine the fitting performance of the fixed and variable kernel methods in comparison with established density estimation methods, by paying attention to their aptitudes for either continuous or discontinuous rate processes. Figure 3(a) shows the results for sinusoidal and sawtooth rate processes, as samples of continuous and discontinuous processes, respectively. We also examined triangular and rectangular rate processes as different samples of continuous and discontinuous processes, but the results were similar. The goodness-of-fit of the density estimate to the underlying rate is evaluated in terms of integrated squared error (ISE) between them.
Fig. 3

Fitting performances of the six rate estimation methods, histogram, fixed kernel, variable kernel, Abramson’s adaptive kernel, Locfit, and Bayesian adaptive regression splines (BARS). (a) Two rate profiles (2 [s]) used in generating spikes (gray area), and the estimated rates using six different methods. The raster plot in each panel is sample spike data (n = 10, superimposed). (b) Comparison of the six rate estimation methods in their goodness-of-fit, based on the integrated squared error (ISE) between the underlying and estimated rate. The abscissa and the ordinate are the ISEs of each method applied to sinusoidal and sawtooth underlying rates (10 [s]). The mean and standard deviation of performance evaluated using 20 data sets are plotted for each method

The established density estimation methods examined for comparison are the histogram (Shimazaki and Shinomoto 2007), Abramson’s adaptive kernel (Abramson 1982), Locfit (Loader 1999b), and Bayesian adaptive regression splines (BARS) (DiMatteo et al. 2001; Kass et al. 2003) methods, whose details are summarized below.

A histogram method, which is often called a peristimulus time histogram (PSTH) in neurophysiological literature, is the most basic method for estimating the spike rate. To optimize the histogram, we used a method proposed for selecting the bin width based on the MISE principle (Shimazaki and Shinomoto 2007).

Abramson’s adaptive kernel method (Abramson 1982) uses the sample point kernel estimate \(\hat \lambda_t = \sum_i k_{w_{t_i}} (t - t_i)\), in which the bandwidths are adapted at the sample points. Scaling the bandwidths as \(w_{t_i} = w \, (g / \hat \lambda_{t_i} )^{1/2} \) was suggested, where w is a pilot bandwidth, \(g= ( \prod\nolimits_i \hat \lambda_{t_i} )^{1/N}\), and \(\hat\lambda_t\) is a fixed kernel estimate with w. Abramson’s method is a two-stage method, in which the pilot bandwidth needs to be selected beforehand. Here, the pilot bandwidth is selected using the fixed kernel optimization method developed in this study.

The Locfit algorithm developed by Loader (1999b) fits a polynomial to a log-density function under the principle of maximizing a locally defined likelihood. We examined the automatic choice of the adaptive bandwidth of the local likelihood, and found that the default fixed method yielded a significantly better fit. We used a nearest neighbor based bandwidth method, with a parameter covering 20% of the data.

The BARS (DiMatteo et al. 2001; Kass et al. 2003) is a spline-based adaptive regression method on an exponential family response model, including a Poisson count distribution. The rate estimated with the BARS is the expected splines computed from the posterior distribution on the knot number and locations with a Markov chain Monte Carlo method. The BARS is, thus, capable of smoothing a noisy histogram without missing abrupt changes. To create an initial histogram, we used 4 [ms] bin width, which is small enough to examine rapid changes in the firing rate.

Figure 3(a) displays the density profiles of the six different methods estimated from an identical set of spike trains (n = 10) that are numerically sampled from a sinusoidal or sawtooth underlying rate (2 [s]). Figure 3(b) summarizes the goodness-of-fit of the six methods to the sinusoidal and sawtooth rates (10 [s]) by averaging over 20 realizations of samples.

For the sinusoidal rate function, representing continuously varying rate processes, the BARS is most efficient in terms of ISE performance. For the sawtooth rate function, representing discontinuous non-stationary rate processes, the variable kernel estimation developed here is the most efficient in grasping abrupt rate changes. The histogram method is always inferior to the other five methods in terms of ISE performance, due to the jagged nature of the piecewise constant function.

3.3 Application to experimental data

We examine, here, the fixed and variable kernel methods in their applicability to real biological data. In particular, the kernel methods are applied to the spike data of an MT neuron responding to a random dot stimulus (Britten et al. 2004). The rates estimated from n = 1, 10, and 30 experimental trials are shown in Fig. 4. Fine details of rate modulation are revealed as we increase the sample size (Bair and Koch 1996). The fixed kernel method tends to choose narrower bandwidths, while the variable kernel method tends to choose wider bandwidths in the periods in which spikes are not abundant.
Fig. 4

Application of the fixed and variable kernel methods to spike data of an MT neuron (j024 with coherence 51.2% in nsa2004.1 (Britten et al. 2004)). (ac): Analyses of n = 1, 10, and 30 spike trains; (top) Spike rates [spikes/s] estimated with the fixed and variable kernel methods are represented by the gray area and solid line, respectively; (middle) optimized fixed and variable bandwidths [s] are represented by dashed and solid lines, respectively; (bottom) A raster plot of the spike data used in the estimation. (d) Comparison of the two kernel methods. Bars represent the difference between the cross-validated cost functions, Eq. (19), of the fixed and variable kernel methods (fix less variable). The positive difference indicates superior fitting performance of the variable kernel method. The cross-validated cost function is obtained as follows. Whole spike sequences (\(n_{\text{total}}=60\)) are divided into \(n_{\text{total}}/n\) blocks, each composed of n ( = 1, 5, 20, 30) spike sequences. A bandwidth was selected using spike data of a training block. The cross-validated cost functions, Eq. (19), for the selected bandwidth are computed using the \(n_{\text{total}}/n -1\) leftover test blocks, and their average is computed. The cost function is repeatedly obtained \(n_{\text{total}}/n\)-times by changing the training block. The mean and standard deviation, computed from \(n_{\text{total}}/n\) samples, are displayed

The performance of the rate estimation methods is cross-validated. The bandwidth, denoted as wt for both fixed and variable, is obtained with a training data set of n trials. The error is evaluated by computing the cost function, Eq. (18), in a cross-validatory manner:
$${{\hat C}_n}\left( w_t \right) = \int_a^b {\hat \lambda _t^2} \, dt - \frac{2}{{{n^2}}}\sum\limits_{i \ne j} {{k_{w_{{t^\prime_i}}^{}}}} \left( {{t^\prime_i} - {t^\prime_j}} \right), \label{eq:CrossValidated_Costfunction} $$
where the test spike times \(\{t^\prime_i\}\) are obtained from n spike sequences in the leftovers, and \(\hat \lambda _t \!=\! \frac{1}{n} \sum_i k_{w_t} (t \!-\! t^\prime_i)\). Figure 4(d) shows the performance improvements by the variable bandwidth over the fixed bandwidth, as evaluated by Eq. (19). The fixed and variable kernel methods perform better for smaller and larger sizes of data, respectively. In addition, we compared the fixed kernel method and the BARS by cross-validating the log-likelihood of a Poisson process with the rate estimated using the two methods. The difference in the log-likelihoods was not statistically significant for small samples (n = 1, 5 and 10), while the fixed kernel method fitted better to the spike data with larger samples (n = 20 and 30).

4 Discussion

In this study, we developed methods for selecting the kernel bandwidth in the spike rate estimation based on the MISE minimization principle. In addition to the principle of optimizing a fixed bandwidth, we further considered selecting the bandwidth locally in time, assuming a non-stationary rate modulation.

We tested the efficiency of our methods using spike sequences numerically sampled from a given rate (Figs. 1 and 2). Various density estimators constructed on different optimization principles were compared in their goodness-of-fit to the underlying rate (Fig. 3). There is in fact no oracle that selects one among various optimization principles, such as MISE minimization or likelihood maximization. Practically, reasonable principles render similar detectability for rate modulation; the kernel methods based on MISE were roughly comparable to the Locfit based on likelihood maximization in their performances. The difference of the performances is not due to the choice of principles, but rather due to techniques; kernel and histogram methods lead to completely different results under the same MISE minimization principle (Fig. 3(b)). Among the smooth rate estimators, the BARS was good at representing continuously varying rate, while the variable kernel method was good at grasping abrupt changes in the rate process (Fig. 3(b)).

We also examined the performance of our methods in application to neuronal spike sequences by cross-validating with the data (Fig. 4). The result demonstrated that the fixed kernel method performed well in small samples. We refer to Cunningham et al. (2008) for a result on the superior fitting performance of a fixed kernel to small samples in comparison with the Locfit and BARS, as well as the Gaussian process smoother (Cunningham et al. 2008; Smith and Brown 2003; Koyama and Shinomoto 2005). The adaptive methods, however, have the potential to outperform the fixed method with larger samples derived from a non-stationary rate profile (See also Endres et al. 2008 for comparisons of their adaptive histogram with the fixed histogram and kernel method). The result in Fig. 4 confirmed the utility of our variable kernel method for larger samples of neuronal spikes.

We derived the optimization methods under the Poisson assumption, so that spikes are randomly drawn from a given rate. If one wishes to estimate spike rate of a single or a few sequences that contain strongly correlated spikes, it is desirable to utilize the information as to non-Poisson nature of a spike train (Cunningham et al. 2008). Note that a non-Poisson spike train may be dually interpreted, as being derived either irregularly from a constant rate, or regularly from a fluctuating rate (Koyama and Shinomoto 2005; Shinomoto and Koyama 2007). However, a sequence obtained by superimposing many spike trains is approximated as a Poisson process (Cox 1962; Snyder 1975; Daley and Vere-Jones 1988; Kass et al. 2005), for which dual interpretation does not occur. Thus the kernel methods developed in this paper are valid for the superimposed sequence, and serve as the peristimulus density estimator for spike trains aligned at the onset or offset of the stimulus.

Kernel smoother is a classical method for estimating the firing rate, as popular as the histogram method. We have shown in this paper that the classical kernel methods perform well in the goodness-of-fit to the underlying rate. They are not only superior to the histogram method, but also comparable to modern sophisticated methods, such as the Locfit and BARS. In particular, the variable kernel method outperformed competing methods in representing abrupt changes in the spike rate, which we often encounter in neuroscience. Given simplicity and familiarity, the kernel smoother can still be the most useful in analyzing the spike data, provided that the bandwidth is chosen appropriately as instructed in this paper.



We thank M. Nawrot, S. Koyama, D. Endres for valuable discussions, and the Diesmann Unit for providing the computing environment. We also acknowledge K. H. Britten, M. N. Shadlen, W. T. Newsome, and J. A. Movshon, who made their data available to the public, and W. Bair for hosting the Neural Signal Archive. This study is supported in part by a Research Fellowship of the Japan Society for the Promotion of Science for Young Scientists to HS and Grants-in-Aid for Scientific Research to SS from the MEXT Japan (20300083, 20020012).

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.


  1. Abeles, M. (1982). Quantification, smoothing, and confidence-limits for single-units histograms. Journal of Neuroscience Methods, 5(4), 317–325.CrossRefPubMedGoogle Scholar
  2. Abramson, I. (1982). On bandwidth variation in kernel estimates-a square root law. The Annals of Statistics, 10(4), 1217–1223.CrossRefGoogle Scholar
  3. Adrian, E. (1928). The basis of sensation: The action of the sense organs. New York: W.W. Norton.Google Scholar
  4. Bair, W., & Koch, C. (1996). Temporal precision of spike trains in extrastriate cortex of the behaving macaque monkey. Neural Computation, 8(6), 1185–1202.CrossRefPubMedGoogle Scholar
  5. Bowman, A. W. (1984). An alternative method of cross-validation for the smoothing of density estimates. Biometrika, 71(2), 353.CrossRefGoogle Scholar
  6. Breiman, L., Meisel, W., & Purcell, E. (1977). Variable kernel estimates of multivariate densities. Technometrics, 19, 135–144.CrossRefGoogle Scholar
  7. Brewer, M. J. (2004). A Bayesian model for local smoothing in kernel density estimation. Statistics and Computing, 10, 299–309.CrossRefGoogle Scholar
  8. Britten, K. H., Shadlen, M. N., Newsome, W. T., & Movshon, J. A. (2004). Responses of single neurons in macaque mt/v5 as a function of motion coherence in stochastic dot stimuli. The Neural Signal Archive. nsa2004.1.
  9. Cox, R. D. (1962). Renewal theory. London: Wiley.Google Scholar
  10. Cunningham, J., Yu, B., Shenoy, K., Sahani, M., Platt, J., Koller, D., et al. (2008). Inferring neural firing rates from spike trains using Gaussian processes. Advances in Neural Information Processing Systems, 20, 329–336.Google Scholar
  11. Daley, D., & Vere-Jones, D. (1988). An introduction to the theory of point processes. New York: Springer.Google Scholar
  12. Devroye, L., & Lugosi, G. (2000). Variable kernel estimates: On the impossibility of tuning the parameters. In E. Giné, D. Mason, & J. A. Wellner (Eds.), High dimensional probability II (pp. 405–442). Boston: Birkhauser.Google Scholar
  13. DiMatteo, I., Genovese, C. R., & Kass, R. E. (2001). Bayesian curve-fitting with free-knot splines. Biometrika, 88(4), 1055–1071.CrossRefGoogle Scholar
  14. Endres, D., Oram, M., Schindelin, J., & Foldiak, P. (2008). Bayesian binning beats approximate alternatives: Estimating peristimulus time histograms. Advances in Neural Information Processing Systems, 20, 393–400.Google Scholar
  15. Fan, J., Hall, P., Martin, M. A., & Patil, P. (1996). On local smoothing of nonparametric curve estimators. Journal of the American Statistical Association, 91, 258–266.CrossRefGoogle Scholar
  16. Gerstein, G. L., & Kiang, N. Y. S. (1960). An approach to the quantitative analysis of electrophysiological data from single neurons. Biophysical Journal, 1(1), 15–28.CrossRefPubMedGoogle Scholar
  17. Hall, P., & Schucany, W. R. (1989). A local cross-validation algorithm. Statistics & Probability Letters, 8(2), 109–117.CrossRefGoogle Scholar
  18. Jones, M., Marron, J., & Sheather, S. (1996). A brief survey of bandwidth selection for density estimation. Journal of the American Statistical Association, 91(433), 401–407.CrossRefGoogle Scholar
  19. Kass, R. E., Ventura, V., & Brown, E. N. (2005). Statistical issues in the analysis of neuronal data. Journal of Neurophysiology, 94(1), 8–25.CrossRefPubMedGoogle Scholar
  20. Kass, R. E., Ventura, V., & Cai, C. (2003). Statistical smoothing of neuronal data. Network-Computation in Neural Systems, 14(1), 5–15.CrossRefGoogle Scholar
  21. Koyama, S., & Shinomoto, S. (2005). Empirical Bayes interpretations of random point events. Journal of Physics A-Mathematical and General, 38, 531–537.CrossRefGoogle Scholar
  22. Loader, C. (1999a). Bandwidth selection: Classical or plug-in? The Annals of Statistics, 27(2), 415–438.CrossRefGoogle Scholar
  23. Loader, C. (1999b). Local regression and likelihood. New York: Springer.Google Scholar
  24. Loftsgaarden, D. O., & Quesenberry, C. P. (1965). A nonparametric estimate of a multivariate density function. The Annals of Mathematical Statistics, 36, 1049–1051.CrossRefGoogle Scholar
  25. Nadaraya, E. A. (1964). On estimating regression. Theory of Probability and its Applications, 9(1), 141–142.CrossRefGoogle Scholar
  26. Nawrot, M., Aertsen, A., & Rotter, S. (1999). Single-trial estimation of neuronal firing rates: From single-neuron spike trains to population activity. Journal of Neuroscience Methods, 94(1), 81–92.CrossRefPubMedGoogle Scholar
  27. Parzen, E. (1962). Estimation of a probability density-function and mode. The Annals of Mathematical Statistics, 33(3), 1065.CrossRefGoogle Scholar
  28. Richmond, B. J., Optican, L. M., & Spitzer, H. (1990). Temporal encoding of two-dimensional patterns by single units in primate primary visual cortex. i. stimulus-response relations. Journal of Neurophysiology, 64(2), 351–369.PubMedGoogle Scholar
  29. Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density-function. The Annals of Mathematical Statistics, 27(3), 832–837.CrossRefGoogle Scholar
  30. Rudemo, M. (1982). Empirical choice of histograms and kernel density estimators. Scandinavian Journal of Statistics, 9(2), 65–78.Google Scholar
  31. Sain, S. R. (2002). Multivariate locally adaptive density estimation. Computational Statistics & Data Analysis, 39, 165–186.CrossRefGoogle Scholar
  32. Sain, S., & Scott, D. (1996). On locally adaptive density estimation. Journal of the American Statistical Association, 91(436), 1525–1534.CrossRefGoogle Scholar
  33. Sain, S., & Scott, D. (2002). Zero-bias locally adaptive density estimators. Scandinavian Journal of Statistics, 29(3), 441–460.CrossRefGoogle Scholar
  34. Sanderson, A. (1980). Adaptive filtering of neuronal spike train data. IEEE Transactions on Biomedical Engineering, 27, 271–274.CrossRefPubMedGoogle Scholar
  35. Scott, D. W. (1992). Multivariate density estimation: Theory, practice, and visualization. New York: Wiley-Interscience.CrossRefGoogle Scholar
  36. Scott, D. W., & Terrell, G. R. (1987). Biased and unbiased cross-validation in density estimation. Journal of the American Statistical Association, 82, 1131–1146.CrossRefGoogle Scholar
  37. Shimazaki, H., & Shinomoto, S. (2007). A method for selecting the bin size of a time histogram. Neural Computation, 19(6), 1503–1527.CrossRefPubMedGoogle Scholar
  38. Shimokawa, T., & Shinomoto, S. (2009). Estimating instantaneous irregularity of neuronal firing. Neural Computation, 21(7), 1931–1951.CrossRefPubMedGoogle Scholar
  39. Shinomoto, S., & Koyama, S. (2007). A solution to the controversy between rate and temporal coding. Statistics in Medicine, 26, 4032–4038.CrossRefPubMedGoogle Scholar
  40. Shinomoto, S., Kim, H., Shimokawa, T., Matsuno, N., Funahashi, S., Shima, K., et al. (2009). Relating neuronal firing patterns to functional differentiation of cerebral cortex. PLoS Computational Biology, 5, e1000433.CrossRefGoogle Scholar
  41. Shinomoto, S., Miyazaki, Y., Tamura, H., & Fujita, I. (2005) Regional and laminar differences in in vivo firing patterns of primate cortical neurons. Journal of Neurophysiology, 94(1), 567–575.CrossRefPubMedGoogle Scholar
  42. Shinomoto, S., Shima, K., & Tanji, J. (2003). Differences in spiking patterns among cortical neurons. Neural Computation, 15(12), 2823–2842.CrossRefPubMedGoogle Scholar
  43. Silverman, B. W. (1986). Density estimation for statistics and data analysis. London: Chapman & Hall.Google Scholar
  44. Smith, A. C., & Brown, E. N. (2003). Estimating a state-space model from point process observations. Neural Computation, 15(5), 965–991.CrossRefPubMedGoogle Scholar
  45. Snyder, D. (1975). Random point processes. New York: Wiley.Google Scholar
  46. Watson, G. S. (1964). Smooth regression analysis. Sankhya: The Indian Journal of Statistics, Series A, 26(4), 359–372.Google Scholar

Copyright information

© The Author(s) 2009

Authors and Affiliations

  1. 1.Grün UnitRIKEN Brain Science InstituteSaitamaJapan
  2. 2.Department of PhysicsKyoto UniversityKyotoJapan

Personalised recommendations