# Kernel bandwidth optimization in spike rate estimation

## Authors

- First Online:

- Received:
- Revised:
- Accepted:

DOI: 10.1007/s10827-009-0180-4

- Cite this article as:
- Shimazaki, H. & Shinomoto, S. J Comput Neurosci (2010) 29: 171. doi:10.1007/s10827-009-0180-4

- 63 Citations
- 3k Views

## Abstract

Kernel smoother and a time-histogram are classical tools for estimating an instantaneous rate of spike occurrences. We recently established a method for selecting the bin width of the time-histogram, based on the principle of minimizing the mean integrated square error (MISE) between the estimated rate and unknown underlying rate. Here we apply the same optimization principle to the kernel density estimation in selecting the width or “bandwidth” of the kernel, and further extend the algorithm to allow a variable bandwidth, in conformity with data. The variable kernel has the potential to accurately grasp non-stationary phenomena, such as abrupt changes in the firing rate, which we often encounter in neuroscience. In order to avoid possible overfitting that may take place due to excessive freedom, we introduced a stiffness constant for bandwidth variability. Our method automatically adjusts the stiffness constant, thereby adapting to the entire set of spike data. It is revealed that the classical kernel smoother may exhibit goodness-of-fit comparable to, or even better than, that of modern sophisticated rate estimation methods, provided that the bandwidth is selected properly for a given set of spike data, according to the optimization methods presented here.

### Keywords

Kernel density estimationBandwidth optimizationMean integrated squared error## 1 Introduction

Neurophysiologists often investigate responses of a single neuron to a stimulus presented to an animal by using the discharge rate of action potentials, or spikes (Adrian 1928; Gerstein and Kiang 1960; Abeles 1982). One classical method for estimating spike rate is the kernel density estimation (Parzen 1962; Rosenblatt 1956; Sanderson 1980; Richmond et al. 1990; Nawrot et al. 1999). In this method, a spike sequence is convoluted with a kernel function, such as a Gauss density function, to obtain a smooth estimate of the firing rate. The estimated rate is sometimes referred to as a spike density function. This nonparametric method is left with a free parameter for kernel bandwidth that determines the goodness-of-fit of the density estimate to the unknown rate underlying data. Although theories have been suggested for selecting the bandwidth, cross-validating with the data (Rudemo 1982; Bowman 1984; Silverman 1986; Scott and Terrell 1987; Scott 1992; Jones et al. 1996; Loader 1999a, b), individual researchers have mostly chosen bandwidth arbitrarily. This is partly because the theories have not spread to the neurophysiological society, and partly due to inappropriate basic assumptions of the theories themselves. Most optimization methods assume a stationary rate fluctuation, while the neuronal firing rate often exhibits abrupt changes, to which neurophysiologists, in particular, pay attention. A fixed bandwidth, optimized using a stationary assumption, is too wide to extract the details of sharp activation, while in the silent period, the fixed bandwidth would be too narrow and may cause spurious undulation in the estimated rate. It is therefore desirable to allow a variable bandwidth, in conformity with data.

The idea of optimizing bandwidth at every instant was proposed by Loftsgaarden and Quesenberry (1965). However, in contrast to the progress in methods that vary bandwidths at sample points only (Abramson 1982; Breiman et al. 1977; Sain and Scott 1996; Sain 2002; Brewer 2004), the local optimization of bandwidth at every instant turned out to be difficult because of its excessive freedom (Scott 1992; Devroye and Lugosi 2000; Sain and Scott 2002). In earlier studies, Hall & Schucany used the cross-validation of Rudemo and Bowman, within local intervals (Hall and Schucany 1989), yet the interval length was left free. Fan et al. applied cross-validation to locally optimized bandwidth (Fan et al. 1996), and yet the smoothness of the variable bandwidth was chosen manually.

In this study, we first revisit the fixed kernel method, and derive a simple formula to select the bandwidth of the kernel density estimation, similar to the previous method for selecting the bin width of a peristimulus time histogram (See Shimazaki and Shinomoto 2007). Next, we introduce the variable bandwidth into the kernel method and derive an algorithm for determining the bandwidth locally in time. The method automatically adjusts the flexibility, or the stiffness, of the variable bandwidth. The performance of our fixed and variable kernel methods are compared with established density estimation methods, in terms of the goodness-of-fit to underlying rates that vary either continuously or discontinuously. We also apply our kernel methods to the biological data, and examine their ability by cross-validating with data.

Though our methods are based on the classical kernel method, their performances are comparable to various sophisticated rate estimation methods. Because of the classics, they are rather convenient for users: the methods simply suggest bandwidth for the standard kernel density estimation.

## 2 Methods

### 2.1 Kernel smoothing

*raw density*, as

*n*is the number of repeated trials. Here, each spike is regarded as a point event that occurs at an instant of time

*t*

_{i}(

*i*= 1,2, ⋯ ,

*N*) and is represented by the Dirac delta function

*δ*(

*t*). The kernel density estimate is obtained by convoluting a kernel

*k*(

*s*) to the raw density

*x*

_{t},

*w*is specified as a subscript. In the body of this study, we develop optimization methods that apply generally to any kernel function, and derive a specific algorithm for the Gauss density function in the Appendix.

### 2.2 Mean integrated squared error optimization principle

*λ*

_{t}. Among several plausible optimizing principles, such as the Kullback-Leibler divergence or the Hellinger distance, we adopt, here, the mean integrated squared error (MISE) for measuring the goodness-of-fit of an estimate to the unknown underlying rate, as

*E*refers to the expectation with respect to the spike generation process under a given inhomogeneous rate

*λ*

_{t}. It follows, by definition, that

*Ex*

_{t}=

*λ*

_{t}.

*λ*

_{t}. Spikes recorded from a single neuron correlate in each sequence (Shinomoto et al. 2003, 2005, 2009). In the limit of a large number of spike trains, however, mixed spikes are statistically independent and the superimposed sequence can be approximated as a single inhomogeneous Poisson point process (Cox 1962; Snyder 1975; Daley and Vere-Jones 1988; Kass et al. 2005).

### 2.3 Selection of the fixed bandwidth

Given a kernel function such as Eq. (3), the density function Eq. (2) is uniquely determined for a raw density Eq. (1) of spikes obtained from an experiment. A bandwidth *w* of the kernel may alter the density estimate, and it can accordingly affect the goodness-of-fit of the density function \(\hat{\lambda}_{t}\) to the unknown underlying rate *λ*_{t}. In this subsection, we consider applying a kernel of a fixed bandwidth *w*, and develop a method for selecting *w* that minimizes the MISE, Eq. (4).

*w*:

*λ*

_{t}=

*Ex*

_{t}, the integrand of the second term in Eq. (5) is given as

*w*

^{ ∗ }. The method for selecting a fixed kernel bandwidth is summarized in Algorithm 1. A particular algorithm developed for the Gauss density function is given in the Appendix.

### 2.4 Selection of the variable bandwidth

*a*,

*b*] . For a non-stationary case, in which the degree of rate fluctuation greatly varies in time, the rate estimation may be improved by using a kernel function whose bandwidth is adaptively selected in conformity with data. The spike rate estimated with the variable bandwidth

*w*

_{t}is given by

*w*

_{t}as a fixed bandwidth optimized in a local interval. In this approach, the interval length for the local optimization regulates the shape of the function

*w*

_{t}, therefore, it subsequently determines the goodness-of-fit of the estimated rate to the underlying rate. We provide a method for obtaining the variable bandwidth

*w*

_{t}that minimizes the MISE by optimizing the local interval length.

*t*as

*w*. Here, a weight function \(\rho _{W}^{u-t}\) localizes the integration of the squared error in a characteristic interval

*W*centered at time

*t*. An example of the weight function is once again the Gauss density function. See the Appendix for the specific algorithm for the Gauss weight function. As in Eq. (5), we introduce the local cost function at time

*t*by subtracting the term irrelevant for the choice of

*w*as

*w*

^{ ∗ }is obtained as a minimizer of the estimated cost function:

*W*, the optimal bandwidth

*w*

^{ ∗ }varies. We suggest selecting an interval length that scales with the optimal bandwidth as

*γ*

^{ − 1}

*w*

^{ ∗ }. The parameter

*γ*regulates the interval length for local optimization: With small

*γ*( ≪ 1) , the fixed bandwidth is optimized within a long interval; With large

*γ*(~1) , the fixed bandwidth is optimized within a short interval. The interval length and fixed bandwidth, selected at time

*t*, are denoted as \(W_{t}^{\gamma}\) and \(\bar{w}_{t}^{\gamma}\).

*t*( ∈ [

*a*,

*b*]). Because the intervals overlap, we adopt the Nadaraya-Watson kernel regression (Nadaraya 1964; Watson 1964) of \(\bar {w}_{t}^{\gamma}\) as a local bandwidth at time

*t*:

*γ*, exhibits different degrees of smoothness: With small

*γ*( ≪ 1) , the variable bandwidth fluctuates slightly; With large

*γ*(~1) , the variable bandwidth fluctuates significantly. The parameter

*γ*is thus a smoothing parameter for the variable bandwidth. Similar to the fixed bandwidth, the goodness-of-fit of the variable bandwidth can be estimated from the data. The cost function for the variable bandwidth selected with

*γ*is obtained as

*γ*

^{ ∗ }that minimizes Eq. (18), local optimization is performed in an ideal interval length. The method for optimizing the variable kernel bandwidth is summarized in Algorithm 2.

## 3 Results

### 3.1 Comparison of the fixed and variable kernel methods

By using spikes sampled from an inhomogeneous Poisson point process, we examined the efficiency of the kernel methods in estimating the underlying rate. We also used a sequence obtained by superimposing ten non-Poissonian (gamma) sequences (Shimokawa and Shinomoto 2009), but there was practically no significant difference in the rate estimation from the Poissonian sequence.

Figure 1 displays the result of the fixed kernel method based on the Gauss density function. The kernel bandwidth selected by Algorithm 1 applies a reasonable filtering to the set of spike sequences. Figure 1(d) shows that a cost function, Eq. (10), estimated from the spike data is similar to the original MISE, Eq. (4), which was computed using the knowledge of the underlying rate. This demonstrates that MISE optimization can, in practice, be carried out by our method, even without knowing the underlying rate.

*γ*for the bandwidth variation is selected by minimizing the cost function, as shown in Fig. 2(c).

### 3.2 Comparison with established density estimation methods

The established density estimation methods examined for comparison are the histogram (Shimazaki and Shinomoto 2007), Abramson’s adaptive kernel (Abramson 1982), Locfit (Loader 1999b), and Bayesian adaptive regression splines (BARS) (DiMatteo et al. 2001; Kass et al. 2003) methods, whose details are summarized below.

A histogram method, which is often called a peristimulus time histogram (PSTH) in neurophysiological literature, is the most basic method for estimating the spike rate. To optimize the histogram, we used a method proposed for selecting the bin width based on the MISE principle (Shimazaki and Shinomoto 2007).

Abramson’s adaptive kernel method (Abramson 1982) uses the sample point kernel estimate \(\hat \lambda_t = \sum_i k_{w_{t_i}} (t - t_i)\), in which the bandwidths are adapted at the sample points. Scaling the bandwidths as \(w_{t_i} = w \, (g / \hat \lambda_{t_i} )^{1/2} \) was suggested, where *w* is a pilot bandwidth, \(g= ( \prod\nolimits_i \hat \lambda_{t_i} )^{1/N}\), and \(\hat\lambda_t\) is a fixed kernel estimate with *w*. Abramson’s method is a two-stage method, in which the pilot bandwidth needs to be selected beforehand. Here, the pilot bandwidth is selected using the fixed kernel optimization method developed in this study.

The Locfit algorithm developed by Loader (1999b) fits a polynomial to a log-density function under the principle of maximizing a locally defined likelihood. We examined the automatic choice of the adaptive bandwidth of the local likelihood, and found that the default fixed method yielded a significantly better fit. We used a nearest neighbor based bandwidth method, with a parameter covering 20% of the data.

The BARS (DiMatteo et al. 2001; Kass et al. 2003) is a spline-based adaptive regression method on an exponential family response model, including a Poisson count distribution. The rate estimated with the BARS is the expected splines computed from the posterior distribution on the knot number and locations with a Markov chain Monte Carlo method. The BARS is, thus, capable of smoothing a noisy histogram without missing abrupt changes. To create an initial histogram, we used 4 [ms] bin width, which is small enough to examine rapid changes in the firing rate.

Figure 3(a) displays the density profiles of the six different methods estimated from an identical set of spike trains (*n* = 10) that are numerically sampled from a sinusoidal or sawtooth underlying rate (2 [s]). Figure 3(b) summarizes the goodness-of-fit of the six methods to the sinusoidal and sawtooth rates (10 [s]) by averaging over 20 realizations of samples.

For the sinusoidal rate function, representing continuously varying rate processes, the BARS is most efficient in terms of ISE performance. For the sawtooth rate function, representing discontinuous non-stationary rate processes, the variable kernel estimation developed here is the most efficient in grasping abrupt rate changes. The histogram method is always inferior to the other five methods in terms of ISE performance, due to the jagged nature of the piecewise constant function.

### 3.3 Application to experimental data

*n*= 1, 10, and 30 experimental trials are shown in Fig. 4. Fine details of rate modulation are revealed as we increase the sample size (Bair and Koch 1996). The fixed kernel method tends to choose narrower bandwidths, while the variable kernel method tends to choose wider bandwidths in the periods in which spikes are not abundant.

*w*

_{t}for both fixed and variable, is obtained with a training data set of

*n*trials. The error is evaluated by computing the cost function, Eq. (18), in a cross-validatory manner:

*n*spike sequences in the leftovers, and \(\hat \lambda _t \!=\! \frac{1}{n} \sum_i k_{w_t} (t \!-\! t^\prime_i)\). Figure 4(d) shows the performance improvements by the variable bandwidth over the fixed bandwidth, as evaluated by Eq. (19). The fixed and variable kernel methods perform better for smaller and larger sizes of data, respectively. In addition, we compared the fixed kernel method and the BARS by cross-validating the log-likelihood of a Poisson process with the rate estimated using the two methods. The difference in the log-likelihoods was not statistically significant for small samples (

*n*= 1, 5 and 10), while the fixed kernel method fitted better to the spike data with larger samples (

*n*= 20 and 30).

## 4 Discussion

In this study, we developed methods for selecting the kernel bandwidth in the spike rate estimation based on the MISE minimization principle. In addition to the principle of optimizing a fixed bandwidth, we further considered selecting the bandwidth locally in time, assuming a non-stationary rate modulation.

We tested the efficiency of our methods using spike sequences numerically sampled from a given rate (Figs. 1 and 2). Various density estimators constructed on different optimization principles were compared in their goodness-of-fit to the underlying rate (Fig. 3). There is in fact no oracle that selects one among various optimization principles, such as MISE minimization or likelihood maximization. Practically, reasonable principles render similar detectability for rate modulation; the kernel methods based on MISE were roughly comparable to the Locfit based on likelihood maximization in their performances. The difference of the performances is not due to the choice of principles, but rather due to techniques; kernel and histogram methods lead to completely different results under the same MISE minimization principle (Fig. 3(b)). Among the smooth rate estimators, the BARS was good at representing continuously varying rate, while the variable kernel method was good at grasping abrupt changes in the rate process (Fig. 3(b)).

We also examined the performance of our methods in application to neuronal spike sequences by cross-validating with the data (Fig. 4). The result demonstrated that the fixed kernel method performed well in small samples. We refer to Cunningham et al. (2008) for a result on the superior fitting performance of a fixed kernel to small samples in comparison with the Locfit and BARS, as well as the Gaussian process smoother (Cunningham et al. 2008; Smith and Brown 2003; Koyama and Shinomoto 2005). The adaptive methods, however, have the potential to outperform the fixed method with larger samples derived from a non-stationary rate profile (See also Endres et al. 2008 for comparisons of their adaptive histogram with the fixed histogram and kernel method). The result in Fig. 4 confirmed the utility of our variable kernel method for larger samples of neuronal spikes.

We derived the optimization methods under the Poisson assumption, so that spikes are randomly drawn from a given rate. If one wishes to estimate spike rate of a single or a few sequences that contain strongly correlated spikes, it is desirable to utilize the information as to non-Poisson nature of a spike train (Cunningham et al. 2008). Note that a non-Poisson spike train may be dually interpreted, as being derived either irregularly from a constant rate, or regularly from a fluctuating rate (Koyama and Shinomoto 2005; Shinomoto and Koyama 2007). However, a sequence obtained by superimposing many spike trains is approximated as a Poisson process (Cox 1962; Snyder 1975; Daley and Vere-Jones 1988; Kass et al. 2005), for which dual interpretation does not occur. Thus the kernel methods developed in this paper are valid for the superimposed sequence, and serve as the peristimulus density estimator for spike trains aligned at the onset or offset of the stimulus.

Kernel smoother is a classical method for estimating the firing rate, as popular as the histogram method. We have shown in this paper that the classical kernel methods perform well in the goodness-of-fit to the underlying rate. They are not only superior to the histogram method, but also comparable to modern sophisticated methods, such as the Locfit and BARS. In particular, the variable kernel method outperformed competing methods in representing abrupt changes in the spike rate, which we often encounter in neuroscience. Given simplicity and familiarity, the kernel smoother can still be the most useful in analyzing the spike data, provided that the bandwidth is chosen appropriately as instructed in this paper.

## Acknowledgements

We thank M. Nawrot, S. Koyama, D. Endres for valuable discussions, and the Diesmann Unit for providing the computing environment. We also acknowledge K. H. Britten, M. N. Shadlen, W. T. Newsome, and J. A. Movshon, who made their data available to the public, and W. Bair for hosting the Neural Signal Archive. This study is supported in part by a Research Fellowship of the Japan Society for the Promotion of Science for Young Scientists to HS and Grants-in-Aid for Scientific Research to SS from the MEXT Japan (20300083, 20020012).

**Open Access**

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.