Kernel smoothing
In neurophysiological experiments, neuronal response is examined by repeatedly applying identical stimuli. The recorded spike trains are aligned at the onset of stimuli, and superimposed to form a raw density, as
$$ x_{t}=\frac{1}{n} \sum_{i=1}^{N}{{\delta\left( {t-t}_{i}\right) }}, \label{rawdensity} $$
(1)
where n is the number of repeated trials. Here, each spike is regarded as a point event that occurs at an instant of time t
i
(i = 1,2, ⋯ , N) and is represented by the Dirac delta function δ(t). The kernel density estimate is obtained by convoluting a kernel k(s) to the raw density x
t
,
$$ \hat{\lambda}_{t}=\int x_{t-s} k{\left( {s}\right) }\,ds. \label{kerneldensity} $$
(2)
Throughout this study, the integral \(\int\) that does not specify bounds refers to \(\int_{-\infty}^{\infty}\). The kernel function satisfies the normalization condition, \(\int k(s) \,ds=1\), a zero first moment, \(\int s k(s) \, ds=0\), and has a finite bandwidth, \(w^2 = \int s^2 k(s) \,ds < \infty\). A frequently used kernel is the Gauss density function,
$$ k_{w}(s) = \frac{1}{\sqrt{2 \pi }w} \exp{\left( -\frac{s^2}{2 w^2} \right)}, \label{gaussdensity} $$
(3)
where the bandwidth w is specified as a subscript. In the body of this study, we develop optimization methods that apply generally to any kernel function, and derive a specific algorithm for the Gauss density function in the Appendix.
Mean integrated squared error optimization principle
Assuming that spikes are sampled from a stochastic process, we consider optimizing the estimate \(\hat{\lambda}_{t}\) to be closest to the unknown underlying rate λ
t
. Among several plausible optimizing principles, such as the Kullback-Leibler divergence or the Hellinger distance, we adopt, here, the mean integrated squared error (MISE) for measuring the goodness-of-fit of an estimate to the unknown underlying rate, as
$$ \mbox{MISE}=\int_{a}^{b}E(\hat{\lambda}_{t}-\lambda_{t})^{2} \,dt, \label{eq:MISE} $$
(4)
where E refers to the expectation with respect to the spike generation process under a given inhomogeneous rate λ
t
. It follows, by definition, that Ex
t
= λ
t
.
In deriving optimization methods, we assume the Poisson nature, so that spikes are randomly sampled at a given rate λ
t
. Spikes recorded from a single neuron correlate in each sequence (Shinomoto et al. 2003, 2005, 2009). In the limit of a large number of spike trains, however, mixed spikes are statistically independent and the superimposed sequence can be approximated as a single inhomogeneous Poisson point process (Cox 1962; Snyder 1975; Daley and Vere-Jones 1988; Kass et al. 2005).
Selection of the fixed bandwidth
Given a kernel function such as Eq. (3), the density function Eq. (2) is uniquely determined for a raw density Eq. (1) of spikes obtained from an experiment. A bandwidth w of the kernel may alter the density estimate, and it can accordingly affect the goodness-of-fit of the density function \(\hat{\lambda}_{t}\) to the unknown underlying rate λ
t
. In this subsection, we consider applying a kernel of a fixed bandwidth w, and develop a method for selecting w that minimizes the MISE, Eq. (4).
The integrand of the MISE is decomposed into three parts: \(E\hat{\lambda}_{t}^{2}-2\lambda_{t}E\hat{\lambda}_{t}+\lambda_{t}^{2}\). Since the last component does not depend on the choice of a kernel, we subtract it from the MISE, then define a cost function as a function of the bandwidth w:
$$\begin{array}{rcl} C_{n}\left( w\right) & = & \mbox{MISE}-\int_{a}^{b}\lambda_{t} ^{2}\,dt \\ & =& \int_{a}^{b}E\hat{\lambda}_{t}^{2}\,dt-2\int_{a}^{b}\lambda_{t} E\hat{\lambda}_{t}\,dt.\label{eq:Cost_Function1} \end{array} $$
(5)
Rudemo and Bowman suggested the leave-one-out cross-validation to estimate the second term of Eq. (5) (Rudemo 1982; Bowman 1984). Here, we directly estimate the second term with the Poisson assumption (See also Shimazaki and Shinomoto 2007).
By noting that λ
t
= Ex
t
, the integrand of the second term in Eq. (5) is given as
$$ E x_{t} E \hat{\lambda}_{t} = E\big[x_{t}\hat{\lambda}_{t}\big] -E\big[(x_{t}-Ex_{t})\big(\hat{\lambda}_{t}-E\hat{\lambda}_{t}\big)\big], \label{correlation_decomposition} $$
(6)
from a general decomposition of covariance of two random variables. Using Eq. (2), the covariance (the second term of Eq. (6)) is obtained as
$$\begin{array}{rcl} &&{\kern-6pt} E\big[(x_{t}-Ex_{t})\big(\hat{\lambda}_{t}-E\hat{\lambda} _{t}\big)\big] \\ &&{\kern12pt} = \int k_w \left( {t-s}\right) E\left[ (x_{t}-Ex_{t}) \left( x_{s}-Ex_{s}\right) \right] \,ds \\ &&{\kern12pt} = \int{k}_{w}{\left( {t-s}\right) }\left[ \delta\left( t-s\right) \frac{1}{n}Ex_{s}\right] \,ds \\ &&{\kern12pt} =\frac{{1}}{n} {k}_{w}(0)Ex_{t}.\label{eq:covariance} \end{array} $$
(7)
Here, to obtain the second equality, we used the assumption of the Poisson point process (independent spikes).
Using Eqs. (6) and (7), Eq. (5) becomes
$$\begin{array}{rcl} C_{n}\left( w\right) &=&\int_{a}^{b}E\hat{\lambda}_{t}^{2}\,dt \\ && -2\int_{a}^{b}\left\{E\big[x_{t}\hat{\lambda}_{t}\big]-\frac{1}{n}k_{w}(0)Ex_{t}\right\} \,dt. \label{eq:Cost_Function_Fix2} \end{array} $$
(8)
Equation (8) is composed of observable variables only. Hence, from sample sequences, the cost function is estimated as
$$ \hat{C}_{n}\left( w\right) =\int_{a}^{b}\hat{\lambda}_{t}^{2}\,dt-2\int _{a}^{b}\left\{x_{t}\hat{\lambda}_{t}-\frac{1}{n}k_{w}(0)x_{t}\right\} \,dt. \label{eq:Cost_Function_Fix_Estimated1} $$
(9)
In terms of a kernel function, the cost function is written as
$$\begin{array}{rcl} \hat{C}_{n}\left( w\right) &=& \frac{1}{n^{2}}\sum\limits_{i,j}\psi_{w}\left( t_{i},t_{j}\right) \\ && -\frac{2}{n^{2}}\left\{\sum\limits_{i,j}k_{w}\left( t_{i}-t_{j}\right) -k_{w} (0)N\,\right\} \\ &=& \frac{1}{n^2} \sum\limits_{i,j}\psi_{w}\left( t_{i},t_{j}\right) -\frac{2}{n^2} \sum\limits_{i\neq j}k_{w}\left( t_{i}-t_{j}\right), \label{eq:Cost_Function_Fix_Estimated} \end{array} $$
(10)
where
$$ \psi_{w}\left( t_{i},t_{j}\right) =\int_{a}^{b}{k}_{w}{\left( t{-} t_{i}\right) k}_{w}{\left( t{-}t_{j}\right) }\,dt.\label{eq:phai_fix} $$
(11)
The minimizer of the cost function, Eq. (10), is an estimate of the optimal bandwidth, which is denoted by w
∗ . The method for selecting a fixed kernel bandwidth is summarized in Algorithm 1. A particular algorithm developed for the Gauss density function is given in the Appendix.
Selection of the variable bandwidth
The method described in Section 2.3 aims to select a single bandwidth that optimizes the goodness-of-fit of the rate estimate for an entire observation interval [a, b] . For a non-stationary case, in which the degree of rate fluctuation greatly varies in time, the rate estimation may be improved by using a kernel function whose bandwidth is adaptively selected in conformity with data. The spike rate estimated with the variable bandwidth w
t
is given by
$$ \hat{\lambda}_{t}=\int x_{t-s} k_{w_{t}}\left( s \right)\,ds. $$
(12)
Here we select the variable bandwidth w
t
as a fixed bandwidth optimized in a local interval. In this approach, the interval length for the local optimization regulates the shape of the function w
t
, therefore, it subsequently determines the goodness-of-fit of the estimated rate to the underlying rate. We provide a method for obtaining the variable bandwidth w
t
that minimizes the MISE by optimizing the local interval length.
To select an interval length for local optimization, we introduce the local MISE criterion at time t as
$$ \mbox{{\em local}MISE}=\int E\left(\hat{\lambda}_{u} - \lambda_{u}\right)^{2}\rho_{W} ^{u-t}du, $$
(13)
where \(\hat{\lambda}_{u} = \int x_{u-s} k_{w}(s) \,ds\) is an estimated rate with a fixed bandwidth w. Here, a weight function \(\rho _{W}^{u-t}\) localizes the integration of the squared error in a characteristic interval W centered at time t. An example of the weight function is once again the Gauss density function. See the Appendix for the specific algorithm for the Gauss weight function. As in Eq. (5), we introduce the local cost function at time t by subtracting the term irrelevant for the choice of w as
$$ C_{n}^{t}\left( w,W\right) =\mbox{{\em local}MISE}-\int\lambda_{u} ^{2}\rho\,_{W}^{u-t} \,du. $$
(14)
The optimal fixed bandwidth w
∗ is obtained as a minimizer of the estimated cost function:
$$ \begin{array}{rcl} \hat{C}_{n}^t\left( w,W\right) &=&\frac{1}{n^2}\sum\limits_{i,j}\psi_{w,W}^{t}\left( t_{i} ,t_{j}\right) \\ && -\frac{2}{n^2} \sum\limits_{i\neq j}k_{w}\left( t_{i}-t_{j}\right) \rho _{W}^{t_{i}-t}, \label{eq:CostFunction_Local_Estimated} \end{array} $$
(15)
where
$$ \psi_{w,W}^{t}\left( t_{i},t_{j}\right) =\int k_{w}\left( u-t_{i}\right) k_{w}\left( u-t_{j}\right) \rho_{W}^{u-t}du.\label{eq:phai_local} $$
(16)
The derivation follows the same steps as in the previous section. Depending on the interval length W, the optimal bandwidth w
∗ varies. We suggest selecting an interval length that scales with the optimal bandwidth as γ
− 1
w
∗ . The parameter γ regulates the interval length for local optimization: With small γ( ≪ 1) , the fixed bandwidth is optimized within a long interval; With large γ(~1) , the fixed bandwidth is optimized within a short interval. The interval length and fixed bandwidth, selected at time t, are denoted as \(W_{t}^{\gamma}\) and \(\bar{w}_{t}^{\gamma}\).
The locally optimized bandwidth \(\bar{w}_{t}^{\gamma}\) is repeatedly obtained for different t( ∈ [a,b]). Because the intervals overlap, we adopt the Nadaraya-Watson kernel regression (Nadaraya 1964; Watson 1964) of \(\bar {w}_{t}^{\gamma}\) as a local bandwidth at time t:
$$ w^\gamma_{t}=\left. \int\rho_{W_{s}^\gamma}^{t-s}\bar{w}_{s}^\gamma \,ds\right/ \int\rho_{W_{s}^\gamma}^{t-s}\,ds. \label{eq:Nadaraya-Watson} $$
(17)
The variable bandwidth \(w_{t}^{\gamma}\) obtained from the same data, but with different γ, exhibits different degrees of smoothness: With small γ( ≪ 1) , the variable bandwidth fluctuates slightly; With large γ(~1) , the variable bandwidth fluctuates significantly. The parameter γ is thus a smoothing parameter for the variable bandwidth. Similar to the fixed bandwidth, the goodness-of-fit of the variable bandwidth can be estimated from the data. The cost function for the variable bandwidth selected with γ is obtained as
$$ \hat{C_n}\left( \gamma \right) =\int_{a}^{b}\hat{\lambda}_{t}^{2} \,dt- \frac{2}{n^2}\sum_{i\neq j}k_{w_{t_i}^\gamma}\left( t_{i}-t_{j}\right) ,\label{eq:CostFunction_Variable_Estimated} $$
(18)
where \(\hat{\lambda}_{t}=\int x_{t-s} k_{w_t^\gamma}\left( s\right)ds\) is an estimated rate, with the variable bandwidth \(w_t^\gamma\). The integral is calculated numerically. With the stiffness constant γ
∗ that minimizes Eq. (18), local optimization is performed in an ideal interval length. The method for optimizing the variable kernel bandwidth is summarized in Algorithm 2.