Kernel bandwidth optimization in spike rate estimation

Shimazaki, Hideaki; Shinomoto, Shigeru

doi:10.1007/s10827-009-0180-4

Kernel bandwidth optimization in spike rate estimation

Open access
Published: 05 August 2009

Volume 29, pages 171–182, (2010)
Cite this article

Download PDF

You have full access to this open access article

Journal of Computational Neuroscience Aims and scope Submit manuscript

Kernel bandwidth optimization in spike rate estimation

Download PDF

Hideaki Shimazaki¹ &
Shigeru Shinomoto²

14k Accesses
263 Citations
1 Altmetric
Explore all metrics

Abstract

Kernel smoother and a time-histogram are classical tools for estimating an instantaneous rate of spike occurrences. We recently established a method for selecting the bin width of the time-histogram, based on the principle of minimizing the mean integrated square error (MISE) between the estimated rate and unknown underlying rate. Here we apply the same optimization principle to the kernel density estimation in selecting the width or “bandwidth” of the kernel, and further extend the algorithm to allow a variable bandwidth, in conformity with data. The variable kernel has the potential to accurately grasp non-stationary phenomena, such as abrupt changes in the firing rate, which we often encounter in neuroscience. In order to avoid possible overfitting that may take place due to excessive freedom, we introduced a stiffness constant for bandwidth variability. Our method automatically adjusts the stiffness constant, thereby adapting to the entire set of spike data. It is revealed that the classical kernel smoother may exhibit goodness-of-fit comparable to, or even better than, that of modern sophisticated rate estimation methods, provided that the bandwidth is selected properly for a given set of spike data, according to the optimization methods presented here.

Flexible models for spike count data with both over- and under- dispersion

Article 23 March 2016

Bayes optimal template matching for spike sorting – combining fisher discriminant analysis with optimal filtering

Article Open access 05 February 2015

Multi-scale detection of rate changes in spike trains with weak dependencies

Article 26 December 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Neurophysiologists often investigate responses of a single neuron to a stimulus presented to an animal by using the discharge rate of action potentials, or spikes (Adrian 1928; Gerstein and Kiang 1960; Abeles 1982). One classical method for estimating spike rate is the kernel density estimation (Parzen 1962; Rosenblatt 1956; Sanderson 1980; Richmond et al. 1990; Nawrot et al. 1999). In this method, a spike sequence is convoluted with a kernel function, such as a Gauss density function, to obtain a smooth estimate of the firing rate. The estimated rate is sometimes referred to as a spike density function. This nonparametric method is left with a free parameter for kernel bandwidth that determines the goodness-of-fit of the density estimate to the unknown rate underlying data. Although theories have been suggested for selecting the bandwidth, cross-validating with the data (Rudemo 1982; Bowman 1984; Silverman 1986; Scott and Terrell 1987; Scott 1992; Jones et al. 1996; Loader 1999a, b), individual researchers have mostly chosen bandwidth arbitrarily. This is partly because the theories have not spread to the neurophysiological society, and partly due to inappropriate basic assumptions of the theories themselves. Most optimization methods assume a stationary rate fluctuation, while the neuronal firing rate often exhibits abrupt changes, to which neurophysiologists, in particular, pay attention. A fixed bandwidth, optimized using a stationary assumption, is too wide to extract the details of sharp activation, while in the silent period, the fixed bandwidth would be too narrow and may cause spurious undulation in the estimated rate. It is therefore desirable to allow a variable bandwidth, in conformity with data.

The idea of optimizing bandwidth at every instant was proposed by Loftsgaarden and Quesenberry (1965). However, in contrast to the progress in methods that vary bandwidths at sample points only (Abramson 1982; Breiman et al. 1977; Sain and Scott 1996; Sain 2002; Brewer 2004), the local optimization of bandwidth at every instant turned out to be difficult because of its excessive freedom (Scott 1992; Devroye and Lugosi 2000; Sain and Scott 2002). In earlier studies, Hall & Schucany used the cross-validation of Rudemo and Bowman, within local intervals (Hall and Schucany 1989), yet the interval length was left free. Fan et al. applied cross-validation to locally optimized bandwidth (Fan et al. 1996), and yet the smoothness of the variable bandwidth was chosen manually.

In this study, we first revisit the fixed kernel method, and derive a simple formula to select the bandwidth of the kernel density estimation, similar to the previous method for selecting the bin width of a peristimulus time histogram (See Shimazaki and Shinomoto 2007). Next, we introduce the variable bandwidth into the kernel method and derive an algorithm for determining the bandwidth locally in time. The method automatically adjusts the flexibility, or the stiffness, of the variable bandwidth. The performance of our fixed and variable kernel methods are compared with established density estimation methods, in terms of the goodness-of-fit to underlying rates that vary either continuously or discontinuously. We also apply our kernel methods to the biological data, and examine their ability by cross-validating with data.

Though our methods are based on the classical kernel method, their performances are comparable to various sophisticated rate estimation methods. Because of the classics, they are rather convenient for users: the methods simply suggest bandwidth for the standard kernel density estimation.

2 Methods

2.1 Kernel smoothing

In neurophysiological experiments, neuronal response is examined by repeatedly applying identical stimuli. The recorded spike trains are aligned at the onset of stimuli, and superimposed to form a raw density, as

$$ x_{t}=\frac{1}{n} \sum_{i=1}^{N}{{\delta\left( {t-t}_{i}\right) }}, \label{rawdensity} $$

(1)

where n is the number of repeated trials. Here, each spike is regarded as a point event that occurs at an instant of time t _i (i = 1,2, ⋯ , N) and is represented by the Dirac delta function δ(t). The kernel density estimate is obtained by convoluting a kernel k(s) to the raw density x _t,

$$ \hat{\lambda}_{t}=\int x_{t-s} k{\left( {s}\right) }\,ds. \label{kerneldensity} $$

(2)

Throughout this study, the integral $\int$ that does not specify bounds refers to $\int_{-\infty}^{\infty}$. The kernel function satisfies the normalization condition, $\int k(s) \,ds=1$, a zero first moment, $\int s k(s) \, ds=0$, and has a finite bandwidth, $w^2 = \int s^2 k(s) \,ds < \infty$. A frequently used kernel is the Gauss density function,

$$ k_{w}(s) = \frac{1}{\sqrt{2 \pi }w} \exp{\left( -\frac{s^2}{2 w^2} \right)}, \label{gaussdensity} $$

(3)

where the bandwidth w is specified as a subscript. In the body of this study, we develop optimization methods that apply generally to any kernel function, and derive a specific algorithm for the Gauss density function in the Appendix.

2.2 Mean integrated squared error optimization principle

Assuming that spikes are sampled from a stochastic process, we consider optimizing the estimate $\hat{\lambda}_{t}$ to be closest to the unknown underlying rate λ _t. Among several plausible optimizing principles, such as the Kullback-Leibler divergence or the Hellinger distance, we adopt, here, the mean integrated squared error (MISE) for measuring the goodness-of-fit of an estimate to the unknown underlying rate, as

$$ \mbox{MISE}=\int_{a}^{b}E(\hat{\lambda}_{t}-\lambda_{t})^{2} \,dt, \label{eq:MISE} $$

(4)

where E refers to the expectation with respect to the spike generation process under a given inhomogeneous rate λ _t. It follows, by definition, that Ex _t = λ _t.

In deriving optimization methods, we assume the Poisson nature, so that spikes are randomly sampled at a given rate λ _t. Spikes recorded from a single neuron correlate in each sequence (Shinomoto et al. 2003, 2005, 2009). In the limit of a large number of spike trains, however, mixed spikes are statistically independent and the superimposed sequence can be approximated as a single inhomogeneous Poisson point process (Cox 1962; Snyder 1975; Daley and Vere-Jones 1988; Kass et al. 2005).

2.3 Selection of the fixed bandwidth

Given a kernel function such as Eq. (3), the density function Eq. (2) is uniquely determined for a raw density Eq. (1) of spikes obtained from an experiment. A bandwidth w of the kernel may alter the density estimate, and it can accordingly affect the goodness-of-fit of the density function $\hat{\lambda}_{t}$ to the unknown underlying rate λ _t. In this subsection, we consider applying a kernel of a fixed bandwidth w, and develop a method for selecting w that minimizes the MISE, Eq. (4).

The integrand of the MISE is decomposed into three parts: $E\hat{\lambda}_{t}^{2}-2\lambda_{t}E\hat{\lambda}_{t}+\lambda_{t}^{2}$. Since the last component does not depend on the choice of a kernel, we subtract it from the MISE, then define a cost function as a function of the bandwidth w:

$$\begin{array}{rcl} C_{n}\left( w\right) & = & \mbox{MISE}-\int_{a}^{b}\lambda_{t} ^{2}\,dt \\ & =& \int_{a}^{b}E\hat{\lambda}_{t}^{2}\,dt-2\int_{a}^{b}\lambda_{t} E\hat{\lambda}_{t}\,dt.\label{eq:Cost_Function1} \end{array} $$

(5)

Rudemo and Bowman suggested the leave-one-out cross-validation to estimate the second term of Eq. (5) (Rudemo 1982; Bowman 1984). Here, we directly estimate the second term with the Poisson assumption (See also Shimazaki and Shinomoto 2007).

By noting that λ _t = Ex _t, the integrand of the second term in Eq. (5) is given as

$$ E x_{t} E \hat{\lambda}_{t} = E\big[x_{t}\hat{\lambda}_{t}\big] -E\big[(x_{t}-Ex_{t})\big(\hat{\lambda}_{t}-E\hat{\lambda}_{t}\big)\big], \label{correlation_decomposition} $$

(6)

from a general decomposition of covariance of two random variables. Using Eq. (2), the covariance (the second term of Eq. (6)) is obtained as

$$\begin{array}{rcl} &&{\kern-6pt} E\big[(x_{t}-Ex_{t})\big(\hat{\lambda}_{t}-E\hat{\lambda} _{t}\big)\big] \\ &&{\kern12pt} = \int k_w \left( {t-s}\right) E\left[ (x_{t}-Ex_{t}) \left( x_{s}-Ex_{s}\right) \right] \,ds \\ &&{\kern12pt} = \int{k}_{w}{\left( {t-s}\right) }\left[ \delta\left( t-s\right) \frac{1}{n}Ex_{s}\right] \,ds \\ &&{\kern12pt} =\frac{{1}}{n} {k}_{w}(0)Ex_{t}.\label{eq:covariance} \end{array} $$

(7)

Here, to obtain the second equality, we used the assumption of the Poisson point process (independent spikes).

Using Eqs. (6) and (7), Eq. (5) becomes

$$\begin{array}{rcl} C_{n}\left( w\right) &=&\int_{a}^{b}E\hat{\lambda}_{t}^{2}\,dt \\ && -2\int_{a}^{b}\left\{E\big[x_{t}\hat{\lambda}_{t}\big]-\frac{1}{n}k_{w}(0)Ex_{t}\right\} \,dt. \label{eq:Cost_Function_Fix2} \end{array} $$

(8)

Equation (8) is composed of observable variables only. Hence, from sample sequences, the cost function is estimated as

$$ \hat{C}_{n}\left( w\right) =\int_{a}^{b}\hat{\lambda}_{t}^{2}\,dt-2\int _{a}^{b}\left\{x_{t}\hat{\lambda}_{t}-\frac{1}{n}k_{w}(0)x_{t}\right\} \,dt. \label{eq:Cost_Function_Fix_Estimated1} $$

(9)

In terms of a kernel function, the cost function is written as

$$\begin{array}{rcl} \hat{C}_{n}\left( w\right) &=& \frac{1}{n^{2}}\sum\limits_{i,j}\psi_{w}\left( t_{i},t_{j}\right) \\ && -\frac{2}{n^{2}}\left\{\sum\limits_{i,j}k_{w}\left( t_{i}-t_{j}\right) -k_{w} (0)N\,\right\} \\ &=& \frac{1}{n^2} \sum\limits_{i,j}\psi_{w}\left( t_{i},t_{j}\right) -\frac{2}{n^2} \sum\limits_{i\neq j}k_{w}\left( t_{i}-t_{j}\right), \label{eq:Cost_Function_Fix_Estimated} \end{array} $$

(10)

where

$$ \psi_{w}\left( t_{i},t_{j}\right) =\int_{a}^{b}{k}_{w}{\left( t{-} t_{i}\right) k}_{w}{\left( t{-}t_{j}\right) }\,dt.\label{eq:phai_fix} $$

(11)

The minimizer of the cost function, Eq. (10), is an estimate of the optimal bandwidth, which is denoted by w ^∗. The method for selecting a fixed kernel bandwidth is summarized in Algorithm 1. A particular algorithm developed for the Gauss density function is given in the Appendix.

2.4 Selection of the variable bandwidth

The method described in Section 2.3 aims to select a single bandwidth that optimizes the goodness-of-fit of the rate estimate for an entire observation interval [a, b] . For a non-stationary case, in which the degree of rate fluctuation greatly varies in time, the rate estimation may be improved by using a kernel function whose bandwidth is adaptively selected in conformity with data. The spike rate estimated with the variable bandwidth w _t is given by

$$ \hat{\lambda}_{t}=\int x_{t-s} k_{w_{t}}\left( s \right)\,ds. $$

(12)

Here we select the variable bandwidth w _t as a fixed bandwidth optimized in a local interval. In this approach, the interval length for the local optimization regulates the shape of the function w _t, therefore, it subsequently determines the goodness-of-fit of the estimated rate to the underlying rate. We provide a method for obtaining the variable bandwidth w _t that minimizes the MISE by optimizing the local interval length.

To select an interval length for local optimization, we introduce the local MISE criterion at time t as

$$ \mbox{{\em local}MISE}=\int E\left(\hat{\lambda}_{u} - \lambda_{u}\right)^{2}\rho_{W} ^{u-t}du, $$

(13)

where $\hat{\lambda}_{u} = \int x_{u-s} k_{w}(s) \,ds$ is an estimated rate with a fixed bandwidth w. Here, a weight function $\rho _{W}^{u-t}$ localizes the integration of the squared error in a characteristic interval W centered at time t. An example of the weight function is once again the Gauss density function. See the Appendix for the specific algorithm for the Gauss weight function. As in Eq. (5), we introduce the local cost function at time t by subtracting the term irrelevant for the choice of w as

$$ C_{n}^{t}\left( w,W\right) =\mbox{{\em local}MISE}-\int\lambda_{u} ^{2}\rho\,_{W}^{u-t} \,du. $$

(14)

The optimal fixed bandwidth w ^∗ is obtained as a minimizer of the estimated cost function:

$$ \begin{array}{rcl} \hat{C}_{n}^t\left( w,W\right) &=&\frac{1}{n^2}\sum\limits_{i,j}\psi_{w,W}^{t}\left( t_{i} ,t_{j}\right) \\ && -\frac{2}{n^2} \sum\limits_{i\neq j}k_{w}\left( t_{i}-t_{j}\right) \rho _{W}^{t_{i}-t}, \label{eq:CostFunction_Local_Estimated} \end{array} $$

(15)

where

$$ \psi_{w,W}^{t}\left( t_{i},t_{j}\right) =\int k_{w}\left( u-t_{i}\right) k_{w}\left( u-t_{j}\right) \rho_{W}^{u-t}du.\label{eq:phai_local} $$

(16)

The derivation follows the same steps as in the previous section. Depending on the interval length W, the optimal bandwidth w ^∗ varies. We suggest selecting an interval length that scales with the optimal bandwidth as γ ^− 1 w ^∗. The parameter γ regulates the interval length for local optimization: With small γ( ≪ 1) , the fixed bandwidth is optimized within a long interval; With large γ(~1) , the fixed bandwidth is optimized within a short interval. The interval length and fixed bandwidth, selected at time t, are denoted as $W_{t}^{\gamma}$ and $\bar{w}_{t}^{\gamma}$.

The locally optimized bandwidth $\bar{w}_{t}^{\gamma}$ is repeatedly obtained for different t( ∈ [a,b]). Because the intervals overlap, we adopt the Nadaraya-Watson kernel regression (Nadaraya 1964; Watson 1964) of $\bar {w}_{t}^{\gamma}$ as a local bandwidth at time t:

$$ w^\gamma_{t}=\left. \int\rho_{W_{s}^\gamma}^{t-s}\bar{w}_{s}^\gamma \,ds\right/ \int\rho_{W_{s}^\gamma}^{t-s}\,ds. \label{eq:Nadaraya-Watson} $$

(17)

The variable bandwidth $w_{t}^{\gamma}$ obtained from the same data, but with different γ, exhibits different degrees of smoothness: With small γ( ≪ 1) , the variable bandwidth fluctuates slightly; With large γ(~1) , the variable bandwidth fluctuates significantly. The parameter γ is thus a smoothing parameter for the variable bandwidth. Similar to the fixed bandwidth, the goodness-of-fit of the variable bandwidth can be estimated from the data. The cost function for the variable bandwidth selected with γ is obtained as

$$ \hat{C_n}\left( \gamma \right) =\int_{a}^{b}\hat{\lambda}_{t}^{2} \,dt- \frac{2}{n^2}\sum_{i\neq j}k_{w_{t_i}^\gamma}\left( t_{i}-t_{j}\right) ,\label{eq:CostFunction_Variable_Estimated} $$

(18)

where $\hat{\lambda}_{t}=\int x_{t-s} k_{w_t^\gamma}\left( s\right)ds$ is an estimated rate, with the variable bandwidth $w_t^\gamma$. The integral is calculated numerically. With the stiffness constant γ ^∗ that minimizes Eq. (18), local optimization is performed in an ideal interval length. The method for optimizing the variable kernel bandwidth is summarized in Algorithm 2.

3 Results

3.1 Comparison of the fixed and variable kernel methods

By using spikes sampled from an inhomogeneous Poisson point process, we examined the efficiency of the kernel methods in estimating the underlying rate. We also used a sequence obtained by superimposing ten non-Poissonian (gamma) sequences (Shimokawa and Shinomoto 2009), but there was practically no significant difference in the rate estimation from the Poissonian sequence.

Figure 1 displays the result of the fixed kernel method based on the Gauss density function. The kernel bandwidth selected by Algorithm 1 applies a reasonable filtering to the set of spike sequences. Figure 1(d) shows that a cost function, Eq. (10), estimated from the spike data is similar to the original MISE, Eq. (4), which was computed using the knowledge of the underlying rate. This demonstrates that MISE optimization can, in practice, be carried out by our method, even without knowing the underlying rate.

Figure 2(a) demonstrates how the rate estimation is altered by replacing the fixed kernel method with the variable kernel method (Algorithm 2), for identical spike data (Fig. 1(b)). The Gauss weight function is used to obtain a smooth variable bandwidth. The manner in which the optimized bandwidth varies in the time axis is shown in Fig. 2(b): the bandwidth is short in a moment of sharp activation, and is long in the period of smooth rate modulation. Eventually, the sharp activation is grasped minutely and slow modulation is expressed without spurious fluctuations. The stiffness constant γ for the bandwidth variation is selected by minimizing the cost function, as shown in Fig. 2(c).

3.2 Comparison with established density estimation methods

We wish to examine the fitting performance of the fixed and variable kernel methods in comparison with established density estimation methods, by paying attention to their aptitudes for either continuous or discontinuous rate processes. Figure 3(a) shows the results for sinusoidal and sawtooth rate processes, as samples of continuous and discontinuous processes, respectively. We also examined triangular and rectangular rate processes as different samples of continuous and discontinuous processes, but the results were similar. The goodness-of-fit of the density estimate to the underlying rate is evaluated in terms of integrated squared error (ISE) between them.

The established density estimation methods examined for comparison are the histogram (Shimazaki and Shinomoto 2007), Abramson’s adaptive kernel (Abramson 1982), Locfit (Loader 1999b), and Bayesian adaptive regression splines (BARS) (DiMatteo et al. 2001; Kass et al. 2003) methods, whose details are summarized below.

A histogram method, which is often called a peristimulus time histogram (PSTH) in neurophysiological literature, is the most basic method for estimating the spike rate. To optimize the histogram, we used a method proposed for selecting the bin width based on the MISE principle (Shimazaki and Shinomoto 2007).

Abramson’s adaptive kernel method (Abramson 1982) uses the sample point kernel estimate $\hat \lambda_t = \sum_i k_{w_{t_i}} (t - t_i)$, in which the bandwidths are adapted at the sample points. Scaling the bandwidths as $w_{t_i} = w \, (g / \hat \lambda_{t_i} )^{1/2} $ was suggested, where w is a pilot bandwidth, $g= ( \prod\nolimits_i \hat \lambda_{t_i} )^{1/N}$, and $\hat\lambda_t$ is a fixed kernel estimate with w. Abramson’s method is a two-stage method, in which the pilot bandwidth needs to be selected beforehand. Here, the pilot bandwidth is selected using the fixed kernel optimization method developed in this study.

The Locfit algorithm developed by Loader (1999b) fits a polynomial to a log-density function under the principle of maximizing a locally defined likelihood. We examined the automatic choice of the adaptive bandwidth of the local likelihood, and found that the default fixed method yielded a significantly better fit. We used a nearest neighbor based bandwidth method, with a parameter covering 20% of the data.

The BARS (DiMatteo et al. 2001; Kass et al. 2003) is a spline-based adaptive regression method on an exponential family response model, including a Poisson count distribution. The rate estimated with the BARS is the expected splines computed from the posterior distribution on the knot number and locations with a Markov chain Monte Carlo method. The BARS is, thus, capable of smoothing a noisy histogram without missing abrupt changes. To create an initial histogram, we used 4 [ms] bin width, which is small enough to examine rapid changes in the firing rate.

Figure 3(a) displays the density profiles of the six different methods estimated from an identical set of spike trains (n = 10) that are numerically sampled from a sinusoidal or sawtooth underlying rate (2 [s]). Figure 3(b) summarizes the goodness-of-fit of the six methods to the sinusoidal and sawtooth rates (10 [s]) by averaging over 20 realizations of samples.

For the sinusoidal rate function, representing continuously varying rate processes, the BARS is most efficient in terms of ISE performance. For the sawtooth rate function, representing discontinuous non-stationary rate processes, the variable kernel estimation developed here is the most efficient in grasping abrupt rate changes. The histogram method is always inferior to the other five methods in terms of ISE performance, due to the jagged nature of the piecewise constant function.

3.3 Application to experimental data

We examine, here, the fixed and variable kernel methods in their applicability to real biological data. In particular, the kernel methods are applied to the spike data of an MT neuron responding to a random dot stimulus (Britten et al. 2004). The rates estimated from n = 1, 10, and 30 experimental trials are shown in Fig. 4. Fine details of rate modulation are revealed as we increase the sample size (Bair and Koch 1996). The fixed kernel method tends to choose narrower bandwidths, while the variable kernel method tends to choose wider bandwidths in the periods in which spikes are not abundant.

The performance of the rate estimation methods is cross-validated. The bandwidth, denoted as w _t for both fixed and variable, is obtained with a training data set of n trials. The error is evaluated by computing the cost function, Eq. (18), in a cross-validatory manner:

$${{\hat C}_n}\left( w_t \right) = \int_a^b {\hat \lambda _t^2} \, dt - \frac{2}{{{n^2}}}\sum\limits_{i \ne j} {{k_{w_{{t^\prime_i}}^{}}}} \left( {{t^\prime_i} - {t^\prime_j}} \right), \label{eq:CrossValidated_Costfunction} $$

(19)

where the test spike times $\{t^\prime_i\}$ are obtained from n spike sequences in the leftovers, and $\hat \lambda _t \!=\! \frac{1}{n} \sum_i k_{w_t} (t \!-\! t^\prime_i)$. Figure 4(d) shows the performance improvements by the variable bandwidth over the fixed bandwidth, as evaluated by Eq. (19). The fixed and variable kernel methods perform better for smaller and larger sizes of data, respectively. In addition, we compared the fixed kernel method and the BARS by cross-validating the log-likelihood of a Poisson process with the rate estimated using the two methods. The difference in the log-likelihoods was not statistically significant for small samples (n = 1, 5 and 10), while the fixed kernel method fitted better to the spike data with larger samples (n = 20 and 30).

4 Discussion

In this study, we developed methods for selecting the kernel bandwidth in the spike rate estimation based on the MISE minimization principle. In addition to the principle of optimizing a fixed bandwidth, we further considered selecting the bandwidth locally in time, assuming a non-stationary rate modulation.

We tested the efficiency of our methods using spike sequences numerically sampled from a given rate (Figs. 1 and 2). Various density estimators constructed on different optimization principles were compared in their goodness-of-fit to the underlying rate (Fig. 3). There is in fact no oracle that selects one among various optimization principles, such as MISE minimization or likelihood maximization. Practically, reasonable principles render similar detectability for rate modulation; the kernel methods based on MISE were roughly comparable to the Locfit based on likelihood maximization in their performances. The difference of the performances is not due to the choice of principles, but rather due to techniques; kernel and histogram methods lead to completely different results under the same MISE minimization principle (Fig. 3(b)). Among the smooth rate estimators, the BARS was good at representing continuously varying rate, while the variable kernel method was good at grasping abrupt changes in the rate process (Fig. 3(b)).

We also examined the performance of our methods in application to neuronal spike sequences by cross-validating with the data (Fig. 4). The result demonstrated that the fixed kernel method performed well in small samples. We refer to Cunningham et al. (2008) for a result on the superior fitting performance of a fixed kernel to small samples in comparison with the Locfit and BARS, as well as the Gaussian process smoother (Cunningham et al. 2008; Smith and Brown 2003; Koyama and Shinomoto 2005). The adaptive methods, however, have the potential to outperform the fixed method with larger samples derived from a non-stationary rate profile (See also Endres et al. 2008 for comparisons of their adaptive histogram with the fixed histogram and kernel method). The result in Fig. 4 confirmed the utility of our variable kernel method for larger samples of neuronal spikes.

We derived the optimization methods under the Poisson assumption, so that spikes are randomly drawn from a given rate. If one wishes to estimate spike rate of a single or a few sequences that contain strongly correlated spikes, it is desirable to utilize the information as to non-Poisson nature of a spike train (Cunningham et al. 2008). Note that a non-Poisson spike train may be dually interpreted, as being derived either irregularly from a constant rate, or regularly from a fluctuating rate (Koyama and Shinomoto 2005; Shinomoto and Koyama 2007). However, a sequence obtained by superimposing many spike trains is approximated as a Poisson process (Cox 1962; Snyder 1975; Daley and Vere-Jones 1988; Kass et al. 2005), for which dual interpretation does not occur. Thus the kernel methods developed in this paper are valid for the superimposed sequence, and serve as the peristimulus density estimator for spike trains aligned at the onset or offset of the stimulus.

Kernel smoother is a classical method for estimating the firing rate, as popular as the histogram method. We have shown in this paper that the classical kernel methods perform well in the goodness-of-fit to the underlying rate. They are not only superior to the histogram method, but also comparable to modern sophisticated methods, such as the Locfit and BARS. In particular, the variable kernel method outperformed competing methods in representing abrupt changes in the spike rate, which we often encounter in neuroscience. Given simplicity and familiarity, the kernel smoother can still be the most useful in analyzing the spike data, provided that the bandwidth is chosen appropriately as instructed in this paper.

References

Abeles, M. (1982). Quantification, smoothing, and confidence-limits for single-units histograms. Journal of Neuroscience Methods, 5(4), 317–325.
Article CAS PubMed Google Scholar
Abramson, I. (1982). On bandwidth variation in kernel estimates-a square root law. The Annals of Statistics, 10(4), 1217–1223.
Article Google Scholar
Adrian, E. (1928). The basis of sensation: The action of the sense organs. New York: W.W. Norton.
Google Scholar
Bair, W., & Koch, C. (1996). Temporal precision of spike trains in extrastriate cortex of the behaving macaque monkey. Neural Computation, 8(6), 1185–1202.
Article CAS PubMed Google Scholar
Bowman, A. W. (1984). An alternative method of cross-validation for the smoothing of density estimates. Biometrika, 71(2), 353.
Article Google Scholar
Breiman, L., Meisel, W., & Purcell, E. (1977). Variable kernel estimates of multivariate densities. Technometrics, 19, 135–144.
Article Google Scholar
Brewer, M. J. (2004). A Bayesian model for local smoothing in kernel density estimation. Statistics and Computing, 10, 299–309.
Article Google Scholar
Britten, K. H., Shadlen, M. N., Newsome, W. T., & Movshon, J. A. (2004). Responses of single neurons in macaque mt/v5 as a function of motion coherence in stochastic dot stimuli. The Neural Signal Archive. nsa2004.1. http://www.neuralsignal.org.
Cox, R. D. (1962). Renewal theory. London: Wiley.
Google Scholar
Cunningham, J., Yu, B., Shenoy, K., Sahani, M., Platt, J., Koller, D., et al. (2008). Inferring neural firing rates from spike trains using Gaussian processes. Advances in Neural Information Processing Systems, 20, 329–336.
Google Scholar
Daley, D., & Vere-Jones, D. (1988). An introduction to the theory of point processes. New York: Springer.
Google Scholar
Devroye, L., & Lugosi, G. (2000). Variable kernel estimates: On the impossibility of tuning the parameters. In E. Giné, D. Mason, & J. A. Wellner (Eds.), High dimensional probability II (pp. 405–442). Boston: Birkhauser.
Google Scholar
DiMatteo, I., Genovese, C. R., & Kass, R. E. (2001). Bayesian curve-fitting with free-knot splines. Biometrika, 88(4), 1055–1071.
Article Google Scholar
Endres, D., Oram, M., Schindelin, J., & Foldiak, P. (2008). Bayesian binning beats approximate alternatives: Estimating peristimulus time histograms. Advances in Neural Information Processing Systems, 20, 393–400.
Google Scholar
Fan, J., Hall, P., Martin, M. A., & Patil, P. (1996). On local smoothing of nonparametric curve estimators. Journal of the American Statistical Association, 91, 258–266.
Article Google Scholar
Gerstein, G. L., & Kiang, N. Y. S. (1960). An approach to the quantitative analysis of electrophysiological data from single neurons. Biophysical Journal, 1(1), 15–28.
Article CAS PubMed Google Scholar
Hall, P., & Schucany, W. R. (1989). A local cross-validation algorithm. Statistics & Probability Letters, 8(2), 109–117.
Article Google Scholar
Jones, M., Marron, J., & Sheather, S. (1996). A brief survey of bandwidth selection for density estimation. Journal of the American Statistical Association, 91(433), 401–407.
Article Google Scholar
Kass, R. E., Ventura, V., & Brown, E. N. (2005). Statistical issues in the analysis of neuronal data. Journal of Neurophysiology, 94(1), 8–25.
Article PubMed Google Scholar
Kass, R. E., Ventura, V., & Cai, C. (2003). Statistical smoothing of neuronal data. Network-Computation in Neural Systems, 14(1), 5–15.
Article Google Scholar
Koyama, S., & Shinomoto, S. (2005). Empirical Bayes interpretations of random point events. Journal of Physics A-Mathematical and General, 38, 531–537.
Article Google Scholar
Loader, C. (1999a). Bandwidth selection: Classical or plug-in? The Annals of Statistics, 27(2), 415–438.
Article Google Scholar
Loader, C. (1999b). Local regression and likelihood. New York: Springer.
Google Scholar
Loftsgaarden, D. O., & Quesenberry, C. P. (1965). A nonparametric estimate of a multivariate density function. The Annals of Mathematical Statistics, 36, 1049–1051.
Article Google Scholar
Nadaraya, E. A. (1964). On estimating regression. Theory of Probability and its Applications, 9(1), 141–142.
Article Google Scholar
Nawrot, M., Aertsen, A., & Rotter, S. (1999). Single-trial estimation of neuronal firing rates: From single-neuron spike trains to population activity. Journal of Neuroscience Methods, 94(1), 81–92.
Article CAS PubMed Google Scholar
Parzen, E. (1962). Estimation of a probability density-function and mode. The Annals of Mathematical Statistics, 33(3), 1065.
Article Google Scholar
Richmond, B. J., Optican, L. M., & Spitzer, H. (1990). Temporal encoding of two-dimensional patterns by single units in primate primary visual cortex. i. stimulus-response relations. Journal of Neurophysiology, 64(2), 351–369.
CAS PubMed Google Scholar
Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density-function. The Annals of Mathematical Statistics, 27(3), 832–837.
Article Google Scholar
Rudemo, M. (1982). Empirical choice of histograms and kernel density estimators. Scandinavian Journal of Statistics, 9(2), 65–78.
Google Scholar
Sain, S. R. (2002). Multivariate locally adaptive density estimation. Computational Statistics & Data Analysis, 39, 165–186.
Article Google Scholar
Sain, S., & Scott, D. (1996). On locally adaptive density estimation. Journal of the American Statistical Association, 91(436), 1525–1534.
Article Google Scholar
Sain, S., & Scott, D. (2002). Zero-bias locally adaptive density estimators. Scandinavian Journal of Statistics, 29(3), 441–460.
Article Google Scholar
Sanderson, A. (1980). Adaptive filtering of neuronal spike train data. IEEE Transactions on Biomedical Engineering, 27, 271–274.
Article CAS PubMed Google Scholar
Scott, D. W. (1992). Multivariate density estimation: Theory, practice, and visualization. New York: Wiley-Interscience.
Book Google Scholar
Scott, D. W., & Terrell, G. R. (1987). Biased and unbiased cross-validation in density estimation. Journal of the American Statistical Association, 82, 1131–1146.
Article Google Scholar
Shimazaki, H., & Shinomoto, S. (2007). A method for selecting the bin size of a time histogram. Neural Computation, 19(6), 1503–1527.
Article PubMed Google Scholar
Shimokawa, T., & Shinomoto, S. (2009). Estimating instantaneous irregularity of neuronal firing. Neural Computation, 21(7), 1931–1951.
Article PubMed Google Scholar
Shinomoto, S., & Koyama, S. (2007). A solution to the controversy between rate and temporal coding. Statistics in Medicine, 26, 4032–4038.
Article PubMed Google Scholar
Shinomoto, S., Kim, H., Shimokawa, T., Matsuno, N., Funahashi, S., Shima, K., et al. (2009). Relating neuronal firing patterns to functional differentiation of cerebral cortex. PLoS Computational Biology, 5, e1000433.
Article Google Scholar
Shinomoto, S., Miyazaki, Y., Tamura, H., & Fujita, I. (2005) Regional and laminar differences in in vivo firing patterns of primate cortical neurons. Journal of Neurophysiology, 94(1), 567–575.
Article PubMed Google Scholar
Shinomoto, S., Shima, K., & Tanji, J. (2003). Differences in spiking patterns among cortical neurons. Neural Computation, 15(12), 2823–2842.
Article PubMed Google Scholar
Silverman, B. W. (1986). Density estimation for statistics and data analysis. London: Chapman & Hall.
Google Scholar
Smith, A. C., & Brown, E. N. (2003). Estimating a state-space model from point process observations. Neural Computation, 15(5), 965–991.
Article PubMed Google Scholar
Snyder, D. (1975). Random point processes. New York: Wiley.
Google Scholar
Watson, G. S. (1964). Smooth regression analysis. Sankhya: The Indian Journal of Statistics, Series A, 26(4), 359–372.
Google Scholar

Download references

Acknowledgements

We thank M. Nawrot, S. Koyama, D. Endres for valuable discussions, and the Diesmann Unit for providing the computing environment. We also acknowledge K. H. Britten, M. N. Shadlen, W. T. Newsome, and J. A. Movshon, who made their data available to the public, and W. Bair for hosting the Neural Signal Archive. This study is supported in part by a Research Fellowship of the Japan Society for the Promotion of Science for Young Scientists to HS and Grants-in-Aid for Scientific Research to SS from the MEXT Japan (20300083, 20020012).

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Author information

Authors and Affiliations

Grün Unit, RIKEN Brain Science Institute, Saitama, 351-0198, Japan
Hideaki Shimazaki
Department of Physics, Kyoto University, Kyoto, 606-8502, Japan
Shigeru Shinomoto

Authors

Hideaki Shimazaki
View author publications
You can also search for this author in PubMed Google Scholar
Shigeru Shinomoto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hideaki Shimazaki.

Additional information

Action Editor: Peter Latham

Appendix: Cost functions of the Gauss kernel function

In this appendix, we derive definite MISE optimization algorithms we developed in the body of the paper with the particular Gauss density function, Eq. (3).

1.1 A.1 A cost function for a fixed bandwidth

The estimated cost function is obtained, as in Eq. (10):

$$ n^2 \hat{C}_n\left( w\right) =\sum_{i,j}\psi_{w}\left( t_{i},t_{j}\right) -2\sum_{i\neq j}k_{w}\left( t_{i}-t_{j}\right) , $$

where, from Eq. (11),

$$ \psi_{w}\left( t_{i},t_{j}\right) =\int_{a}^{b}{k}_{w}{\left( t{-} t_{i}\right) k}_{w}{\left( t{-}t_{j}\right) }\,dt. $$

A symmetric kernel function, including the Gauss function, is invariant to exchange of t _i and t _j when computing k _w(t _i − t _j) . In addition, the correlation of the kernel function, Eq. (11), is symmetric with respect to t _i and t _j. Hence, we obtain the following relationships

$$ \sum\limits_{i\neq j}{{k_{w}}\left( {{t_{i}}-{t_{j}}}\right) } =2\sum\limits_{i<j}{{k_{w}}\left( {{t_{i}}-{t_{j}}}\right) }, \label{eq:kernel_sum} $$

(20)

$$ \sum_{i,j}\psi_{w}\left( t_{i},t_{j}\right) =\sum_{i}\psi_{w}\left( t_{i},t_{i}\right) +2\sum_{i<j}\psi_{w}\left( t_{i},t_{j}\right). \label{eq:phai_sum} $$

(21)

By plugging Eqs. (20) and (21) into Eq. (10), the cost function is simplified as.

$$ \begin{array}{rcl} n^2 \hat{C}_n\left( w\right) &=& \sum\limits_{i}\psi_{w}\left( t_{i},t_{i}\right) \\ &&+2\sum\limits_{i<j}\left\{ \psi_{w}\left( t_{i},t_{j}\right) -2{{k_{w} }\left( {{t_{i}}-{t_{j}}}\right) }\right\} .\label{eq:CostFunction_Fix_Estimated_Symmetric} \end{array} $$

(22)

For the Gauss kernel function, Eq. (3), with bandwidth w, Eq. (11) becomes

$$ \begin{array}{rcl}\displaystyle \psi_{w}\left( t_{i},t_{j}\right) &\!=\!& \frac{1}{\sqrt{\pi}4w}e^{-\frac {(t_{i}-t_{j})^{2}}{4w^{2}}} \\ && \times \left\{\mbox{erf}\left(\frac{2b-t_{i}-t_{j}}{2w}\right)-\mbox{erf}\left(\frac{2a-t_{i}-t_{j} }{2w}\right)\right\}, \\\label{eq:GaussCostFunction_Bounded} \end{array} $$

(23)

where $\mbox{erf}\left( z\right) =\frac{2}{\sqrt{\pi}}\int_{0}^{z}e^{-t^{2} }dt$. A simplified equation is obtained by evaluating the MISE in an unbounded domain: a→ − ∞ and b→ + ∞. Using $\mbox{erf}\left( \pm\infty\right) =\pm1$, we obtain

$$ \psi_{w}\left( t_{i},t_{j}\right) \approx\frac{1}{\sqrt{\pi}2w} e^{-\frac{(t_{i}-t_{j})^{2}}{4w^{2}}}.\label{eq:phai_fix_approx} $$

(24)

Using Eq. (24) in Eq. (22), we obtain a formula for selecting the bandwidth of the Gauss kernel function, $2\sqrt{\pi} n^2 \hat{C}_n(w)$:

$$ \frac{N}{w}+\frac{2}{w}\sum_{i<j}\left\{e^{-\frac{\left( t_{i}-t_{j}\right) ^{2}}{4w^{2}}}-2\sqrt{2}e^{-\frac{\left( t_{i} -t_{j}\right) ^{2}}{2w^{2}}}\right\}. \label{eq:GaussCostFunction_Unbounded} $$

(25)

1.2 A.2 A local cost function for a variable bandwidth

The local cost function is obtained, as in Eq. (15):

$$ n^2 \hat{C}_n^t\left( w,W\right) =\sum_{i,j}\psi_{w,W}^{t}\left( t_{i} ,t_{j}\right) -2\sum_{i\neq j}k_{w}\left( t_{i}-t_{j}\right) \rho _{W}^{t_{i}-t}, $$

where, from Eq. (16),

$${{\psi_{{w,W}}^{t}}\left( {{t_{i}}},{{t_{j}}}\right) }={\int{k_{w} (u-{{t_{i}}})k_{w}(u-{t_{j}})\,\rho_{W}^{u-t} \,du}}. $$

For the summations in the local cost function, Eq. (15), we have equalities:

$$ \sum\limits_{i\neq j}{{k_{w}}\left( {{t_{i}}-{t_{j}}}\right) }\rho _{W}^{t_{i}-t}=\sum\limits_{i<j}{k_{w}}\left( {{t_{i}}-{t_{j}}}\right) \left\{\rho_{W}^{t_{i}-t}+\rho_{W}^{t_{j}-t}\right\},\label{eq:kernel_sum_local} $$

(26)

$$ \sum_{i,j}{{\psi_{{w,W}}^{t}}}\left( t_{i},t_{j}\right) =\sum_{i} {{\psi_{{w,W}}^{t}}}\left( t_{i},t_{i}\right) +2\sum_{i<j}{{\psi_{{w,W}} ^{t}}}\left( t_{i},t_{j}\right) .\label{eq:phai_sum_local} $$

(27)

The first equation holds for a symmetric kernel. The second equation is derived because Eq. (16) is invariant to an exchange of t _i and t _j. Using Eqs. (26) and (27), Eq. (15) can be computed as

$$ \begin{array}{lll} n^2 \hat{C}_n^t\left( w,W\right) &=& \sum\limits_{i}{{\psi_{{w,W}}^{t}}}\left({{t_{i}}},{{t_{j}}}\right) \\&& \kern-54pt+2\sum\limits_{i<j}\left[{{\psi_{{w,W}}^{t}}}\left( {{t_{i}} },{{t_{j}}}\right)-{k_{w}}\left({{t_{i}}-{t_{j}}}\right) \left\{\rho_{W}^{t_{i}-t}+\rho_{W}^{t_{j}-t}\right\}\right]. \end{array} $$

(28)

For the Gauss kernel function and the Gauss weight function with bandwidth w and W respectively, Eq. (16) is calculated as

$$ \begin{array}{lll}{{\psi_{{w,W}}^{t}}}\left( {{t_{i}}},{{t_{j}}}\right) &=&\frac{1}{{2\pi{w^{2}}}}\frac{1}{{\sqrt{2\pi}{W}}}\\&&\kern-50pt \times \int\exp\left[-\frac{{{{\left( {u-{t_{i}}}\right) }^{2}}+{{\left( {u-{t_{j} }}\right)}^{2}}}}{{2{w^{2}}}} - \frac{{{{\left( {u-t}\right) }^{2}}} }{{2{W}^{2}}}\right]du. \end{array} $$

(29)

By completing the square with respect to u, the exponent in the above equation is written as

$$ \begin{array}{rcl} &&{\kern-6.5pt} \frac{{{w^{2}+}2{W}^{2}}}{{2{w^{2}}{W}^{2}}}{\left\{ {u+\frac{{\tau{W}^{2}+\left( t_{i}{-t}\right) {w^{2}}}}{{{w^{2}+} 2{W}^{2}}}}\right\} ^{2}} \\ &&{\kern6pt} -\frac{\left[(t-t_{i})^{2}+(t-t_{j})^{2}\right]w^{2}+({{t_{i}}-{t_{j}} })^{2}W^{2}}{{2{w^{2}}\left( {{w^{2}+}2{W}^{2}}\right) }}. \end{array} $$

(30)

Using the formula $\int{e}^{-Au^{2}}{du=}\sqrt{\frac{\pi}{A}}$, Eq. (16) is obtained as

$$ \begin{array}{rcl} {\psi{_{{w,W}}^{t}}}\!\left( {{t_{i}}},{{t_{j}}}\right) &\!=\!& \frac{1}{{2\pi {w}}\sqrt{{{w^{2}+}2{W}^{2}}}} \\ && {\kern-46pt} \times \exp\left[-\frac{{\left[(t \!-\! t_{i})^{2} \!+\! (t \!-\! t_{j})^{2}\right]w}^{2} \!+\! ({{t_{i} } \!-\! {t_{j}}})^{2}W^{2}}{{2{w^{2}}\left( {{w^{2}+}2{W}^{2}}\right) }}\right]. \end{array} $$

(31)

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Shimazaki, H., Shinomoto, S. Kernel bandwidth optimization in spike rate estimation. J Comput Neurosci 29, 171–182 (2010). https://doi.org/10.1007/s10827-009-0180-4

Download citation

Received: 22 December 2008
Revised: 20 May 2009
Accepted: 23 July 2009
Published: 05 August 2009
Issue Date: August 2010
DOI: https://doi.org/10.1007/s10827-009-0180-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Kernel bandwidth optimization in spike rate estimation

Abstract

Similar content being viewed by others

Flexible models for spike count data with both over- and under- dispersion

Bayes optimal template matching for spike sorting – combining fisher discriminant analysis with optimal filtering

Multi-scale detection of rate changes in spike trains with weak dependencies

1 Introduction