RippleNet: a Recurrent Neural Network for Sharp Wave Ripple (SPW-R) Detection

Hippocampal sharp wave ripples (SPW-R) have been identified as key bio-markers of important brain functions such as memory consolidation and decision making. Understanding their underlying mechanisms in healthy and pathological brain function and behaviour rely on accurate SPW-R detection. In this multidisciplinary study, we propose a novel, self-improving artificial intelligence (AI) detection method in the form of deep Recurrent Neural Networks (RNN) with Long Short-Term memory (LSTM) layers that can learn features of SPW-R events from raw, labeled input data. The approach contrasts conventional routines that typically relies on hand-crafted, heuristic feature extraction and often laborious manual curation. The algorithm is trained using supervised learning on hand-curated data sets with SPW-R events obtained under controlled conditions. The input to the algorithm is the local field potential (LFP), the low-frequency part of extracellularly recorded electric potentials from the CA1 region of the hippocampus. Its output predictions can be interpreted as time-varying probabilities of SPW-R events for the duration of the inputs. A simple thresholding applied to the output probabilities is found to identify times of SPW-R events with high precision. The non-causal, or bidirectional variant of the proposed algorithm demonstrates consistently better accuracy compared to the causal, or unidirectional counterpart. Reference implementations of the algorithm, named ‘RippleNet’, are open source, freely available, and implemented using a common open-source framework for neural networks (tensorflow.keras) and can be easily incorporated into existing data analysis workflows for processing experimental data.


Introduction
Experimental background: Sharp wave ripples (SPW-R) are highly synchronous, fast oscillations observed in the CA1 region of the hippocampus of mammals, and are linked to mechanisms that play important roles in memory function (Buzsáki 2015). The oscillations associated with SPW-Rs are typically observed in the local field potential (LFP), which is the low-frequency ( 300 Hz) part of extracellularly recorded electric potentials measured in neural tissue. SPW-Rs arise in sleep and resting states and consist of large amplitude deflections of the local field potential (LFP) signal originating in the hippocampal CA3 region ('sharp waves'), that can elicit fast oscillations in the hippocampal CA1 region ('ripples'). Excitatory output from the CA1 region during ripples encodes sequences of neuronal activation of awake experiences, that reaches wide areas of the cortex as well as subcortical nuclei. For a comprehensive review on SPW-Rs, their origin and function, see for example Buzsáki 2015.
The SPW-R oscillations are observed above the cortical γ-band frequencies (30 − 90 Hz) (Silva 2013) of the LFP, and lie between 160 − 180 Hz in mice (Buzsáki, Logothetis, and Singer 2013;Buzsáki et al. 2003), and between 130 − 160 Hz in rats (Buzsáki, Logothetis, and Singer 2013;Buzsaki et al. 1992;John O'Keefe 1978). Features of one such example SPW-R event recorded using the experimental setup in Figure 1A,B are illustrated in Figure 1C. SPW-R spectrogram 10 4 10 3 10 2 10 1 (a.u.) Figure 1: Experimental setup. A set of recordings used for this study were acquired concomitant to two-photon microscopy from head-fixed mice. A: Mice were prepared with a single electrode in the the hippocampal CA1 region and a contralateral reference electrode, chronic glass window for two-photon microscopy and a headbar for head-fixation. B: LFP recordings were mostly recorded concomitant to two-photon microscopy in head-fixed mice. C: Example of a single detected SPW-R. Top panel: raw LFP data; Middle panel: bandpass filtered LFP signal (150-250 Hz); Bottom panel: LFP spectrogram.

SPW-R detection:
The fractions of neurons in hippocampal regions CA1 and CA3 which are active during different SPW-R events vary greatly, and the number of small and medium-sized events outnumber large, highly synchronous events (Csicsvari et al. 1999a;Buzsáki 2015). Hence, the resulting distributions of SPW-R power are skewed as the synchrony between neurons throughout the network (i.e., correlations) greatly affects the LFP power in the SPW-R band (Csicsvari et al. 2000;Schomburg et al. 2012;Buzsáki 2015;Hagen et al. 2016). Consequently, defining a fixed threshold for SPW-R detection based on e.g., the power or envelope of the LFP in a chosen frequency band (Csicsvari et al. 1999a;Csicsvari et al. 1999b;Einevoll et al. 2013;Ramirez-Villegas, Logothetis, and Besserve 2015;Norman et al. 2019;Tingley and Buzsáki 2020) remains heuristic.
In speech recognition, deep RNNs with multiple stacked LSTM layers have been successful in classifying phonemes (Graves, Mohamed, and Hinton 2013). Graves, Mohamed, and Hinton 2013 also found bidirectional LSTM RNNs to improve classification performance over unidirectional LSTM RNNs, which can only account for past context. The present context of SPW-R detection is analogous and amounts to recognition of a single phoneme or word in a temporal sequential signal such as sound.
Summary of present work: Here, inspired by RNNs shown to be successful on speechrecognition tasks, we propose the utilization of LSTM-based RNNs for the automated detection of SPW-R events in continuous LFP data. Our open-source implementation, RippleNet, is built with a combination of convolutional, (bidirectional) LSTM and dense output layers with non-linear activation functions. RippleNet accepts raw LFP traces of arbitrary length as input, and omits the typical SPW-R detection steps of band-pass filtering the input LFP as well as manual feature extractions such as computing the signal envelope via the Hilbert transform or time-frequency resolved spectrograms. Using training data with labeled SPW-R events in real-world datasets from different sources we trained RippleNet to predict a continuous signal representing the time varying probability of SPW-R event in the input. A simple search of local peaks above a fixed threshold can then be applied with the output probabilities, and is here shown to yield accurate predictions of the time of SPW-R events in separate validation and test datasets, with low prediction rates of false positive (FP) and false negative (FN) events. RippleNet also runs faster than real-time on typical CPUs, and even faster on graphical processing units (GPU).
RippleNet's implementation differs from Zuo et al. 2019, one of very few deep CNN based algorithms specifically designed for detection of high-frequency oscillations (HFO), that is, epileptogenic zone seizures in intracranial electroencephalogram (iEEG) recordings, in that (1) explicit conversion of 1D input sequences with multiple rows into gray-scale images are avoided; (2) normalization of the input to zero mean and standard deviation to unity is not required; (3) input segments can be of arbitrary length (i.e., continuous) but segments within single batches have to be of the same length; (4) a fairly low number of parameters are trainable which may reduce overfitting; and (5) its outputs are continuous signals that represent the time varying probability of an SPW-R event at all time points of the inputs, in contrast to classifying whether or not a HFO class occurs in each fixed-size input segment.
Manuscript structure: This paper is organized as follows: In Methods we detail the acquisition of experimental LFP data, labeling of SPW-R events, preprocessing steps, numerical analysis, and the technical implementation of RippleNet. In Results we evaluate the performance of RippleNet during training and application on separate validation and test sets. In Discussion we discuss the possible consequences, extensions and other applications of this work. Male and female mice (C57Bl/6J; Janvier labs) underwent LFP electrode implant surgery at approximately 10-14 weeks of age. All mice had previously been implanted 2-3 weeks earlier with a custom made titanium headbar glued to the skull and covered with a dental cement cap. For electrode implant surgery, mice were anesthetized with isoflurane (3 induction, 1.5 maintenance) with body temperature maintained at 37 degrees Celsius. Burr holes were drilled for the LFP electrode and reference electrode over the dorsal CA1 region of the hippocampus (A/P -2 mm, M/L 2 mm) and contralateral primary somatosensory cortex (A/P -0.5 mm, M/L 3 mm), respectively. Silver wire electrodes (1.25 mm diameter, insulated, GoodFellow) were lowered to a depth of 0.8 mm for dorsal CA1. The reference electrode was implanted at the brain surface. Mice were allowed to recover from isoflurane anesthesia while headfixed for at least 15 minutes, and electrode placement was confirmed by monitoring the LFP signal online. Electrodes were affixed to the headbar with cyanoacrylate glue and a thin layer of dental cement.
All procedures were approved by the Norwegian Food Safety Authority (project: FOTS 19129 LFP recordings were band-pass filtered (0.1-1000 Hz) and amplified (1000x) with a DAM50 differential amplifier (World Precision Instruments Inc). Line noise was removed using a HumBug 50/60 Hz Noise Eliminator (Quest Scientific Inc). For experiments, mice were headfixed under a two-photon microscope objective after brief isoflurane anesthesia. They were given at least 15 minutes to recover from anesthesia before recordings were taken. In most cases, LFP recordings were performed concurrently with two-photon calcium imaging through a chronic cranial window over the retrosplenial cortex, in 10 minute sessions. During recordings, mice were able to walk freely on a disc equipped with a rotary encoder to record locomotion, grooming and postural adjustments. Experiments were performed in the dark. LFP and rotary encoder signals were acquired at 20 kHz and downsampled to a final sampling frequency f s = 2500 Hz in LabView (National Instruments). The LFP signals were saved in units of millivolts (mV).
The different animals, number of sessions, total recording durations and number of SPW-R events are listed in Table 1.
SPW-R detection procedure: Pre-processing for manual SPW-R detection was performed using MATLAB 2018 1 . The LFP signal was first band-pass filtered between 150 and 250 Hz using a digital filter filtfilt. The coefficients for the order 600 finite impulse response (FIR) filter were generated using the fir1 function. The band-pass filtered LFP was then used to compute the absolute of the Hilbert transform of the data. The output was smoothed by convolving with a 1052point Gaussian filter with σ = 40 ms using gaussfilt (James Conder 2020). The findpeaks function was used to find peaks which were 3 standard deviations above mean in a moving time window with duration 1 s. A minimum peak width at half height of 15 ms and a minimal peak distance of 25 ms were required, calculated based on data reported in Axmacher, Elger, and Fell 2008;Davidson, Kloosterman, and Wilson 2009;Caputi et al. 2012. Potential ripple locations where then manually inspected using the symmetric one second time window around it, based on the Hilbert transformation and the raw LFP signal.

Rat data
To supplement the training and validation datasets containing SPW-R events that could be extracted from the in-house datas described above, we utilized publicly available datasets from the Buzsaki lab webshare 2 (Petersen, Hernandez, and Buzsáki 2018). The datas were obtained in the adult rat (Long Evans) in awake and sleep states using chronically implanted probes with a total of 96 or 128 channels (Tingley and Buzsáki 2018;Tingley and Buzsáki 2020). The datasets were The LFP signal of contacts located in CA1 was extracted and converted to units of mV, along with the corresponding times and durations of labeled CA1 SPW-R events. SPW-R events were identified and labeled as described in Tingley and Buzsáki 2020. We here defined SPW-R event times as the mean of onset and offset times. All events in awake and sleep states were extracted. The sampling frequency of the LFP data was here f s = 1250 Hz.
The different animals, number of sessions, total recording durations and number of SPW-R events are summarized in Table 1.

Data preprocessing
The mouse LFP data were downsampled to a common sampling frequency f s = 1250 Hz and temporal resolution ∆t = 1/f s . For temporal downsampling we used the scipy.signal.decimate function with default parameters. The rat LFP datasets were used as is. For visualization, we extracted the band-pass filtered LFP φ BP (t) from the LFP φ(t) using 2nd-order Butterworth filter coefficients computed with critical frequencies f c ∈ {150, 250} Hz. Filters were applied to φ(t) using a zero phase shift, forward-backward filter implementation. Filter coefficients were computed using scipy.signal.butter and applied with scipy.signal.filtfilt.

Wavelet spectrograms
To compute spectrograms of LFP data φ(t) we relied on the complex Morlet transform with parameters ω = 6, scaling factor s = 1 and lengths M f = 2sf s ω/f for fundamental frequencies f ∈ {100, 110 . . . , 240, 250} Hz. The numbers M f were rounded down to the nearest integer. The set of discrete wavelet coefficients for each frequency f were computed using the function scipy.signal.morlet as Each row of the spectrograms S(t, f ) = [S f (t)] were then computed for all frequencies in f as where the asterisk denotes a convolution. We used the discrete 1D implementation by scipy.signal.convolution in 'full' mode. To visualize the spectrograms, we employed a loglinear matplotlib.cm.inferno color map, with lower and upper limits determined as exp(c), where c is the 1% and 99% percentiles of log(S), respectively.

Training, validation and test data
Input data: We chose to use the raw single-channel LFP data segments as input to the neural network algorithm, that is, by defining X(t) = [φ(t)]. For reasons related to the RNN implementation we defined each segment X(t) as shape (n timesteps , 1) arrays, even if we here work with single-channel LFP data.
One-hot encoding of SPW-R events: The train of n labeled times t i of the SPW-R events in each continuous LFP data segment can be expressed mathematically as where δ(·) denotes the dirac delta distribution, and i the index of the event in a session. We then assumed that each SPW-R has a typical duration D = 50 ms on the interval [t i − D/2, t i + D/2). A binary 'one-hot' encoding vector for the SPW-R events y(t) was then computed as for the entire duration of each LFP segment. Here θ(·) denotes the Heaviside step function. The vector y(t) can be interpreted as the time-varying, binary probability p ∈ [0, 1] of an SPW-R occurring at any given time t.
Datasets: As the SPW-R occurrence in the data was sparse (that is, y(t) = 0 for most t), training the neural network on entire data segments of different durations is impractical. A likely training outcome is predictingŷ(t) = 0 for all times t of the input. For each labeled SPW-R event we therefore extracted temporal segments of duration T sample = 1000 ms from X(t) and y(t), that is, on the interval [t j − T j offset − T sample /2, t j − T j offset + T sample /2). The offsets T j offset ∈ [−(T sample − 3D)/2, (T sample − 3D)/2) were randomly drawn for each event. The superscript j here denotes a sample indexed by j from any LFP recording session.
For the total number n SPW−R of SPW-R samples across all animals and sessions, the shapes of the combined input and output dataset matrices X and y for training, validation and testing were both (n SPW−R , T sample /∆t, 1).
All data entries except for a hold-out set were randomly reordered along their first axis, and then split into 2 separate file sets for validation and training, each of sizes summarized in table 1. The validation set was used to monitor loss during training and quantification of performance as detailed below. The hold-out test set constructed from the entire session of one animal (mouse 4029) was only utilized for final evaluation of the RNN after training and validation. For visualization purposes we also stored the corresponding segments of band-pass filtered LFP (φ j BP (t)) and spectrograms (S j (t, f )) for every labeled event.
In an effort to balance the set of features that can be learned by RippleNet from datas obtained mice and rat, we extracted a similar count of SPW-R events for training and validation from the rat data as for the mouse data.

RippleNet implementation 2.3.1 Network description
The causal and non-causal RippleNet implementations, summarized schematically and with parameters in Table 2, consist of a Gaussian noise layer applied to the input, then one 1D convolutional layer followed by a dropout layer, followed by another 1D convolutional layer followed by batch-normalization, rectified-linear (ReLu) activation and dropout layers. The output of the last convolutional layer are consecutively fed to the first (bidirectional) LSTM layer followed by batch normalization and dropout. The final (bidirectional) LSTM layer is followed by a dropout, batchnormalization and a final dropout layer. Bidirectional layers are optionally applied using a wrapper function. The output is governed by a time-distributed layer wrapping a dense activation layer, which facilitates application of the dense layer to every temporal slice of the input. Hence, the output is a matrix of the same dimensionality as the input matrix.
The Gaussian noise layer and dropout layers are only active during training in order to prevent overfitting of the training set, and inactive during validation and testing. The kernel weights of the convolutional, dense and LSTM layers are initialized with the Glorot uniform initializer. Recurrent connections in LSTM layers are initialized using the Orthogonal initializer. For optimization we chose the Adam algorithm which implements an adaptive stochastic gradient descent method (Kingma and Ba 2014). The settings for model compilation, optimization algorithm and model fitting are summarized in Table 3. For 3-fold cross-validation different RNN instances are initialized using different random seeds affecting initializers, dropout layers and optimization. This ensure replicable results on similar GPU hardware.
Layer dimensions were hand tuned, with the aim of reducing the amount of trainable parameters and reducing evaluation times and overall training times, while maintaining achievable loss J and M SE reasonably low.

Loss function and evaluation metric
For training the RNN we used the binary cross entropy loss function (tf.keras.losses.BinaryCrossentropy) where N = T sample /∆t is the number of temporal samples in the label array y and RNN prediction y. To monitor training and validation performance of the RNN we used the mean squared error

Thresholding
The outputŷ j (t) ∈ (0, 1) of RippleNet is a discrete signal of same temporal duration and resolution as an input segment X j (t). The signalŷ j (t) can be interpreted as the time-varying probability of an SPW-R ripple event. To extract time points of candidate ripple events, we ran the peakfinding algorithm implemented by scipy.signal.find_peaks using an initial threshold of 0.5, minimum peak inter-distance of 50 ms (same as D) and peak width of 20 ms. These parameters were set heuristically. Other parameters were left at their default values.

Quantification of true and false predictions
On the validation and test data sets, we counted a true positive (T P ) for the predicted timet SPW−R of an SPW-R event if y(t SPW−R ) = 1, false positive (F P ) if y(t SPW−R ) = 0 and false negative (F N ) if no peaks above threshold inŷ(t) were found in time intervals where y(t) = 1.ŷ(t) can be above threshold if FP predictions occur next to labeled SPW-R events and result in FN counts. Negative samples, where y(t) = 0 for all times spanned by the LFP samples, were not included in any of the training, validation or test sets. Hence, evaluation of true negative (T N ) predictions were not performed. Note, however, that by construction, each sample y j (t) was equal to zero up to 95% of the time spanned by the sample, and that more than one SPW-R event may exist in each segment.  The following quantification metrics of SPW-R detection performance are used: Recall is sometimes referred to as True Positive Rate (TPR) and Sensitivity in the literature (e.g., by Zuo et al. 2019). Precision is also known as Positive Predictive Value (PPV). The F 1 score represents the harmonic mean of Precision and Recall. These metrics are all defined on the interval [0, 1], with 1 being best.

Correlation analysis
To quantify the temporal agreement with labeled and predicted SPW-R event times, we compute the cross-correlation coefficients between predicted ripple event timest j and labeled event times t j as function of time lag τ as (Eggermont 2010, Eq. 5.10): Here, υ andυ are the time binned N υ and Nυ times of labeled and predicted SPW-R events using a bin width ∆ = 2 ms, where N = N υ + Nυ.

Signal energy
To quantify 'strengths' of ripples in the band-pass filtered LFP, we compute the signal energy (not to be confused with energy in physics) as where N = 2τ /∆t and τ ∈ [−100, 100] ms denotes time relative to the SPW-R event time.

Results
We here present our main findings and analysis of RippleNet, an automated, trainable recurrent neural network algorithm for detecting SPW-R events in single-channel LFP recordings.

Experimental data
Brain signals such as the LFP are characterized by low-frequency fluctuations, with spurious oscillatory events that may occur in different parts of the frequency spectrum. A few 1 s samples of hippocampus CA1 LFP, here used as validation data X j (t) ∈ X val for the SPW-R detection algorithm, are shown in Figure 2A. Each sample contains at least one labeled SPW-R event at times marked by the green diamond symbols. The SPW-R events identified using a conventional method involving manual steps (cf. Methods), are hardly discernible by eye. They stand out, however, in the corresponding bandpass-filtered LFP signals φ j BP (t) (panel B) and in the time-frequency resolved spectrograms S j (t, f ) (panel C). Individual samples may also include potential SPW-R events that were not labeled. Events may have amplitudes of ∼ 0.1 mV in the filtered signal. Their durations are also short ( 100 ms).

Training and validation of RippleNet
We next continue with 3-fold cross validation of different instances of RippleNet during and after training. We investigate both unidirectional (causal) and bidirectional (non-causal) variants of Rip-pleNet, keeping the counts of trainable parameters about a factor two higher for the unidirectional variant (cf. Table 2). The computational load during training was approximately a factor two higher for the bidirectional variant. We observed ∼90 and ∼170 ms/step on Tesla P4 GPUs during training, respectively. The entire dataset (X, y) is split into separate training, validation and test sets with dimensions detailed in Table 1. Each RippleNet instance was initialized in each trial with different random seeds affecting initial weights, parameters etc., throughout the different model layers.
The training and validation loss J and M SE as function of training epoch are shown in Figure 3A,B, respectively. Continuous lines show training performance, while dashed lines show validation performance for different models. Models M1-3 are unidirectional, while models M4-6 are bidirectional. All models within each group, except M1, display similar and stable trajectories for their 3 python.org 4 jupyter.org 5 numpy.org 6 scipy.org 7 h5py.org 8 matplotlib.org 9 pandas.pydata.org 10 www.anaconda.com 11 seaborn.pydata.org 12 tensorflow.org 13 colab.research.google.com

Validation set performance
Training and validation losses J and M SE only provide an indication of the ability to detect SPW-R events using the different models. First, in Figure 4 we visually compare a subset of predictionŝ y j (t) on LFP samples from a validation set (X j (t) ∈ X val (t)), to one-hot encoded SPW-R events y j (t) (see Methods). Here, all models produce predictions (panels C) with responses above the detection threshold for labeled events, but spurious threshold crossings may occur elsewhere.
The non-causal bidirectional RippleNet variants (models M4-M6) produce output with notably less spurious fluctuations below threshold, when compared to the causal variants (M1-M3). These spurious fluctuations reflect the fact that signal power in the expected frequency range of SPW-R events do not vanish due to other ongoing neural processes, measurement noise etc. The bidirectional models do an overall better job at predicting the boxcar shapes of the one-hot encoded SPW-R events in panel B, owing to the fact that the full input time series are factored into their predictions. We next quantify the different models' performance in terms of counts of true positives (TP), false positives (FP) and false negative (FN) on the full validation set. Summarized in Table 4, trained models of each kind (uni-vs. bidirectional RippleNets) show similar numbers of TP events using the initial settings for the peak-finding algorithm when applied to the individual model predictionŝ y j (t). The counts of TPs are for all models generally in the same range, but total error counts (FP plus FN counts) are consistently higher for the unidirectional RippleNets compared to their bidirectional counterparts.
In Table 4 we also compute the corresponding measures of performance from the TP, FP and FN counts. P recision, the ratio between TP predictions and total number of predictions, and Recall, the ratio between TP predictions and the sum of TP and FN predictions, are for most models around 0.9. Bidirectional models which show a better performance in terms of TP, FP and TN counts, and result in improved F 1 scores of around 0.92.
y 4 M6 (t) 100 ms Figure 4: Comparison of RippleNet predictions on samples from the same test set. Each column corresponds to different LFP samples X j shown at the top. A: Input LFP samples X j . The diamonds mark the labeled times of SPW-R events. B: Label vectors y j (t). C: Predictionsŷ j (t) made by the different RippleNet instances. SPW-R events found by the peak-finding algorithm are marked with diamond markers.

Effect of detection threshold and width parameters
The analysis above assumes fixed hyper-parameters for the peak-finding algorithm (cf. Methods) applied to the predictions by RippleNet instances on the validation set, including threshold, minimal peak interdistance and width (in units of time steps of size ∆t). We next hypothesize that the total error counts (FP+FN) can be minimized and correct prediction counts (TP) can be maximized using a hyper-parameter grid search, and therefore chose to optimize thresholds and widths for each network with respect to the F 1 -score. We keep the minimal peak interdistance the same as the boxcar filter width used to construct y(t). Summarized in Figure 5, the TP and FP counts for each model increased when lowering the threshold and width. FN counts increase for high threshold values and widths. Bidirectional models (M4-6) are less affected by the width setting compared to the unidirectional variants. The different models display different 'sweet spots' in terms of total number or errors (FP+FN). These counts are reflected in the calculated P recision and Recall values. The F 1 space show for some models multiple local maxima. Here model 4 has the overall best performance, both in terms of least amounts of errors and highest F 1 score. For further analysis and later application to a hidden test set we therefore choose that model, with detection threshold 0.7 and peak width of 0 time steps. In passing, we note that the other two bidirectional RippleNet instances achieve nearly similar levels of performance, while unidirectional variants have higher error counts.  Figure 5: Effect of varying parameters threshold and width for the peak finding algorithm applied to predictions on the validation datas made by different RippleNet instances. Each row corresponds to different models and the columns to different metrics. Colorbars are shared among panels in each column. The cross hatches in the F 1 column correspond to parameter combinations maximizing the F 1 score as summarized in Table 5.

False (FP & FN) predictions
Having assessed the best performing RippleNet model instance and combination of width and threshold parameters on the validation set (Table 5), we next analyze features FP and FN predictions on the validation dataset. LFP samples resulting in FP and/or FN predictions are illustrated in Figure 6-7. From this subset of samples, FP predictions appear to occur for transient events carrying power in the 150-250 Hz frequency range as reflected in both band-pass filtered LFPs (panels B) and LFP spectrograms (panels C), similar to correct (TP) predictions. One explanation may be that the procedure used to process the data initially either missed SPW-R events with poor signal-to-noise ratio, or that they were rejected manually based on some criteria. The prediction vectorsŷ j (t) approach a value of 1 in some FP cases, implying a high probability of an actual SPW-R event.
For the set of samples resulting in FN predictions, the RippleNet make predictionsŷ(t) with magnitudes during the labeled SPW-R events that simply fail to produce a large enough amplitude and/or width for the peak-finding algorithm to detect the event. Here, a reduction of the threshold value for example will reduce FN counts, and increase FP and TP counts (cf. row 4 in Figure 5). Other cases resulting in both a FN and FP registration occurs if the predicted event time is outside of the boxcar shapes of the one-hot encoded signal. One such case is occurring in column 2 of Figure 6.

Ripple detection in time-continuous LFP data.
With the same RippleNet instance as in the previous section, we next pay attention to the litmus test of this project, that is, applications to time continuous LFP recordings of arbitrary durations. We choose the 10 minute duration LFP signal of one session of one animal excluded from training or validation data (mouse 4029, session 1, see Table 1), and make predictions using RippleNet. This hold-out data set mimics new recordings unavailable at the time of training the networks. Predicted events within 1 s of movement periods are removed from the analysis to suppress FPs resulting from e.g., muscle noise.
By construction, the RippleNet algorithm can, in principle, be run on LFPs of arbitrary duration, even if all training and validation samples are of the same duration. We tested two operating modes: Either feeding in the entire LFP sequence as a single sample (not shown), or reshaping the LFP sequence into many sequential samples of the same duration. For the latter the predictions made on each sample (ŷ j (t)) can be concatenated together to form a continuous signal spanning the duration of the LFP entirely. In practice, 5-fold zero-padding and splitting of the signal into samples of duration 0.5 s, running predictions, concatenating predictions and computing the median output worked well on the hidden test set. Start-up transients in the output are suppressed by zero-padding the beginning and end of the full LFP signal by various amounts and realigning the predictions accordingly before computing the median.
For the 10 s segment X(t) shown in Figure 8A, with corresponding bandpass-filtered LFP, spectrogram and one-hot encoded events (panels B-D), all labeled SPW-R events are found (panel E). Not surprisingly, other significant responses with strengths above the peak-finding detection threshold are also found, resulting in a larger count of FPs compared to TPs (summarized in Table 6). Based on the previous analysis on a validation set with no negative samples that result in an error rate of about one per seven TP SPW-R event, we expect a higher frequency of FP predictions when predictions are made on samples spanning the entire session. For the approximate 10 minutes duration of the input LFP sequence, the chosen RippleNet instance finds about two times the number of events compared to the number of labeled SPW-R events in the input LFP sequence, see Figure 9A and Table 6. The cumulative count of predicted events appears linearly dependent on the cumulative count of labeled events. The cross-correlation coefficients ρ yŷ (τ ) between predicted event times and labeled events in the test set (in bins of 2 ms) in Figure 9B, demonstrates a temporally precise prediction of event times, well within the 50 ms boxcar window around for each labeled SPW-R event in y(t) ( Figure 8D).

Features of predicted SPW-R events
Having established that the chosen RippleNet instance predicts presumed FP events at a high rate relative to TP rate in continuous LFP data, we next investigate the dependence between predicted SPW-R probability (ŷ j ) and signal energy in the band-pass filtered LFP E j s (Equation (14)). The RippleNet instance fares well with the labeled events in the hidden test set, with only a handful of FNs but many FPs (summarized in Table 6). The majority of labeled samples result in probabilitieŝ y j above the detection threshold 0.7. The ten samples with highest predicted probability are shown in Figure 10 rows 1-3, as well as the 10 samples with lowest predicted probability in rows 4-6. The RippleNet model instance recognizes SPW-R events with high amplitudes and quite stereotypical appearance both in the band-pass filtered LFP and spectrograms. At the lower end of the scale, SPW-R events show irregular fluctuations at lower amplitudes. The same holds true for the SPW-R events detected above threshold by the RippleNet algorithm ( Figure 11). Detected events have transient activity around 150 Hz in their respective spectrograms.

Summary
In this paper we have introduced the RippleNet algorithm for detecting SPW-R events in timecontinuous LFP data as recorded with single-or multi-channel probes in hippocampus CA1. The development of the RNN was motivated by high-performance speech recognition systems which utilize deep LSTM based RNNs (Graves, Mohamed, and Hinton 2013;Michalek and Vanek 2018).
In the present context, the binary problem of detecting SPW-R events is simpler than speech recognition which must distinguish between different phonemes making up a spoken language. As such, the SPW-R detection task is analogous to mobile device wake up call detection to commands such as "Hello Siri!" or "OK, Google!" in noisy environments.

RippleNet performance on validation data
We trained different versions of RippleNet, each instantiated with different random weights on the same samples from the full set of manually scored data obtained in both mouse and rat CA1. On our validation dataset with mouse and rat data, the best-performing RippleNet instance resulted in 522 TP, 43 FP and 36 FN SWP-R predictions resulting in a combined F 1 score of 0.93 (Table 5).
The non-causal versions of RippleNet utilizing bidirectional LSTMs is found to outperform the causal unidirectional versions during training and validation. On the validation data with optimized detection threshold settings, unidirectional RippleNets achieved similar TP counts, but with consistently higher error counts than bidirectional variants. The best-performing unidirectional RippleNet resulted in 518 TPs, 71 FPs and 38 FNs and F 1 = 0.905.
The fact that the bidirectional versions outperform the unidirectional versions during training and validation, even if numbers of trainable parameters are larger for the latter cases is an indication that also the future context of the input LFP contains information about SPW-R events. Thus, real-time applications of RippleNet may be hampered by use of the unidirectional version, which may only use past and present context in order to make a prediction. For offline detection of SPW-R events the better choice is the bidirectional version.

RippleNet performance on test data
A hidden test dataset was also obtained from a single animal with one single session that is not included in any of the training/validation data. This best reflected real-world application to newly obtained LFP recordings in mouse. Features of actual SPW-R events may differ somewhat from those in the training/validation data obtained in different animals and species (Table 1). Test performance (in terms of loss J, MSE, Precision, Recall, F 1 ) can be expected to be reduced compared to results obtained on the train and validation set. Indeed, the resulting counts of 78 TPs, 104 FPs and 8 FNs resulted in poor Precision (0.429) but still good Recall (0.907) and harmonic mean between the two (F 1 ) of 0.582. We found that application of the RippleNet algorithm results in far more predictions of events with low energy than the conventional detection procedure used to label the test set initially.
When continuous LFP is used as input to the algorithm, RippleNet predicts many more events than were manually labeled in the test set, but superfluous events have similar features to labeled events. Given the nature of RNN parameters trained using backpropagation (Hochreiter and Schmidhuber 1997), the RippleNet algorithm may also be sensitive to hidden features in the LFP different than high-frequency (around 150 Hz or so) oscillations typically associated with SPW-R events, raising the question of whether or not conventional SPW-R detection algorithms relying on band-pass filtered LFPs discard useful information contained in other parts of the raw signal. One major caveat to the fact RippleNet algorithm finds most labeled events in the test and validation sets, but also many other positives, imply that the user must still make manual, quite likely subjective, judgements of whether or not detected events are true SPW-R events. As discussed below, results judged by a domain expert can be used to improve the method.

Improved data and extensions of RippleNet
Additional training data: Additional datasets containing labeled SPW-R events, available from online resources such as CRCNS.org (Teeters et al. 2008), can be added as soon as they become available. At present, several CRCNS deposits with CA1 LFPs have been made, but not every dataset comes with labeled SPW-R events. The uploaded datas are mostly obtained in rats using different kinds of electrodes such as laminar probes, tetrodes, and in different brain states such as sleep, anesthesized and awake states. While CA1 SPW-R events may represent underlying brain mechanisms that are highly preserved across species, it is a priori unclear if the SPW-R features our algorithm learned to recognize in the presently used mouse and rat datasets overlap with those in other data. The SPW-R events may for example have a different distributions of power across frequencies, or typical durations. For the present paper we opted to use only two sources of data, which should each be internally consistent in terms of data quality and methods (species, acquisition hardware, noise levels, data processing steps, label consistency etc.).
Data augmentation: Synthesizing recordings could also act as a potential supplement to real data. Generative Adverserial Nets (GAN) (Goodfellow et al. 2014) have for instance proven to produce very lifelike data in other domains such as image generation (e.g., Karras et al. 2019). There is an untapped potential to generate virtually unlimited amounts of 'fake' LFPs with similar statistics (power spectrum, temporal correlations, etc.) as the real data. A simple SPW-R model based on the superposition of modulated oscillatory events on pink (1/f ) noise was already proposed by Sethi and Kemere 2014, but pure pink noise can not account for the temporal correlations of real data.
Neural network architecture: RNNs with LSTM layers or Gated Rectified Unit (GRU) layers (Chung et al. 2014) have for some time been considered state-of-the-art in sequence learning (Bai, Zico Kolter, and Koltun 2018 should also be evaluated for the SPW-R detection task described throughout this manuscript. The framework developed here around the high-level tensorflow.keras module allows for straightforward comparison between different architectures. This comparison should also include conventional CNNs (LeCun, Bengio, and Hinton 2015) and variants such as deep residual networks (He et al. 2015) and inception networks (Szegedy et al. 2015;Ismail Fawaz et al. 2019). With the LSTM-based architectures we finally chose, one could potentially achieve even better performance by varying hyper parameters for the optimizer (e.g., learning rate), dropout layers (dropout rate), disabling batch-normalizing layers, optimize kernel sizes for the convolutional layers, add additional hidden layers and so forth.
Better-quality labels: While we here did not systematically compare predictions using fundamentally different architectures, we briefly tested multi-layered CNNs, causal TCNs, and replacing LSTM layers with GRU layers, but saw either worse or similar performance on the training and validation data. Similarly, we also tested increasing the layer sizes (and numbers of trainable parameters) and noted longer evaluation times and only slight improvements in accuracy. With the existing data and labels the features any deep learning method may learn is limited if labels are inaccurate. For instance, the rat dataset contained information on SPW-R durations (which we ignored) while the mouse data only contained their locations. More accurate predictions on the existing dataset during training and validation can be achieved by more thorough labeling, perhaps by multiple experts independently.
Self-improving method: As soon as RippleNet has been used to find SPW-R event times in batches of new data, validated SPW-R events can supplement the initial training dataset. Then, the pre-trained RippleNet instance presented here can be trained for more iterations, learn new features and consolidate learned features present in the initial and new samples. With time and several such iterations, an even better performance can be achieved. Another possibility is classification of different kinds of SPW-R events and non-SPW-R (noise) events. We did not yet persuade such classification, as it would require a modification to the final dense output layer to use the so-called Softmax activation function instead of the presently used sigmoid activation function. The output dimensionality and values would then reflect the number of classes and respective probabilities.
Online and offline RippleNet applications: The bidirectional LSTM architecture is here found to perform better than the causal, unidirectional LSTM variant. Only the latter may see potential use in online and real-time applications, which only rely on past and present segments of the input time series. High temporal accuracy may be achieved, and unidirectional RippleNet instances may therefore see potential use in closed loop experiments where a stimulus may be delivered at the times of detected SPW-R events. As the RippleNet algorithm runs on temporally downsampled LFP signals, a good realtime factor is achievable on short segments, in particular if the computer has GPU accelerating capabilities.
Pre-trained RippleNet models can easily be loaded (with the tf.keras.load_model function in Python), and can be incorporated into Python-based data processing workflows with ease. For this purpose, models may also be converted to the higher-level tf.estimator API.
Web/cloud applications: The majority of development and analysis of RippleNet was incorporated using Jupyter notebooks 14 running on the Google Colaboratory portal 15 with data file access and synchronization via Google Drive 16 . RippleNet can be provided as a Cloud service, or as a service running locally on the user's computer. The latter may facilitate on not having to upload potentially large files but will greatly benefit from a local GPU in order to accelerate compute times. Distribution of RippleNet to end users can be done using containers in Docker 17 , Kubernetes 18 or similar. A cloud service however would facilitate on the powerful GPU backends provided via services like Google Cloud 19 which also has efficient data handling.
Another option could also be a port of RippleNet to tensorflow.js as the model is already only using keras constructs. The conversion step appears trivial 20 and could allow execution of RippleNet in html contexts.
Graphical User Interface (GUI): The current RippleNet version is set up as a step-by-step workflows in Jupyter notebooks for training, validation and application to continuous data, respectively. While a standalone, interactive RippleNet application with a GUI is certainly possible to develop using cross-platform application tools such as PyQT 21 , it is presently only considered. A prototype jupyter notebook which allows for user-interactive rejection of detected events (noise events) and storage of accepted events has been implemented, however.

Outlook
This work is an effort to introduce novel machine learning and deep learning algorithms for the detection of SPW-R events in electrophysiological data. The RippleNet algorithm presented here learns through supervised learning an internal representation of features of SPW-R events, which facilitates a highly non-linear transformation of input LFP signals into output signals that represent the time-varying probabilities of SPW-R events. This represents a fundamental change from the typical procedure employed in standard detection workflows relying on hand crafted feature extraction. With access to more training data with labeled events, the method can improve by running more training iterations on new data. Our hope is that the tool can reduce the amount of time the experimentalist spend on manual extraction of SPW-R events using heuristic criteria, and allow for a better understanding of features of these events and their role in brain function. Also, we hope that this powerful framework may be adapted to other detection tasks, for instance onset of epileptic seizures. In this direction, there are many potential applications, also clinical.

Data and code availability statement
All source codes and data to reproduce the findings and illustrations of this paper are available at github.com/espenhgn/RippleNet and Zenodo.org (Hagen 2020).
6 Author contributions EH, KHP, RE and AJS conceived and conceptualized the project. EH wrote the paper. EH, ARC, RE, KHP, GTE and AJS cowrote the paper. ARC and RE measured mouse LFP datas and labeled SPW-R events. AJS wrote the manual SPW-R detection code. EH wrote and executed all codes for extracting training/validation/test datasets, neural networks, neural-network training, analysis and plots for this paper.