# A continuous mapping of sleep states through association of EEG with a mesoscale cortical model

- First Online:

- Received:
- Revised:
- Accepted:

## Abstract

Here we show that a mathematical model of the human sleep cycle can be used to obtain a detailed description of electroencephalogram (EEG) sleep stages, and we discuss how this analysis may aid in the prediction and prevention of seizures during sleep. The association between EEG data and the cortical model is found via locally linear embedding (LLE), a method of dimensionality reduction. We first show that LLE can distinguish between traditional sleep stages when applied to EEG data. It reliably separates REM and non-REM sleep and maps the EEG data to a low-dimensional output space where the sleep state changes smoothly over time. We also incorporate the concept of strongly connected components and use this as a method of automatic outlier rejection for EEG data. Then, by using LLE on a hybrid data set containing both sleep EEG and signals generated from the mesoscale cortical model, we quantify the relationship between the data and the mathematical model. This enables us to take any sample of sleep EEG data and associate it with a position among the continuous range of sleep states provided by the model; we can thus infer a trajectory of states as the subject sleeps. Lastly, we show that this method gives consistent results for various subjects over a full night of sleep and can be done in real time.

### Keywords

Sleep Sleep staging Sleep scoring Seizure Locally linear embedding Electroencephalogram Cortical model Mesoscale Mean-field## 1 Introduction

The standard method of sleep scoring involves categorization of electroencephalogram (EEG) data into five separate stages (Niedermeyer and da Silva 2005). However, the *discrete* nature of these stages limits their utility as analytical and predictive tools. For example, in a study of human epilepsy, it may be observed that a seizure occurred during stage 2 sleep. This prompts further questions: Was the subject descending to deeper stages of sleep or arising from them? How quickly was the subject moving through each stage? Was a transition imminent when the seizure occurred?

The use of a mathematical model of the human sleep cycle may allow us to answer such questions by providing a *continuous* spectrum of sleep states, ranging from REM to the deepest slow-wave sleep. If the model can be directly associated with human sleep EEG data, it will be possible to track the subject’s state to identify the stage as well as changes in sleep depth and proximity to transitions. Ideally, this would be done in real-time, where the state is continuously determined as the subject sleeps. The process must be consistent over various subjects and robust to non-standard sleep cycles and periods of waking.

Here we utilize a technique called locally linear embedding (LLE) to make this connection between a model of the human sleep cycle and EEG data. First, we present a model of the human cortex with subcortical inputs represented by added driven noise, and we describe the associated mathematical representation of the sleep cycle (Section 2). We then introduce the technique of locally linear embedding (Section 3) and show that it provides the ability to distinguish between sleep stages when applied to EEG data (Section 4). These results demonstrate reliable separation between REM and NREM sleep data and provide a smooth temporal progression through the various stages of sleep. We also present the concept of strongly connected components as a method of outlier rejection for EEG data (Section 3.2) and introduce a method for automatic selection of LLE parameters (Section 4.3). Then, by performing LLE on a hybrid data set containing both sleep EEG and signals generated from the mathematical model, we are able to integrate the EEG and the model (Section 5). This allows us to take any sample of sleep EEG data and determine its position within the continuous range of sleep states provided by the model. We show that this method provides consistent results for various subjects over a full night of sleep, and it could be done online as the subject sleeps.

## 2 Mean-field cortical model

### 2.1 Background and mathematics

Mean-field models of the cortex are well-suited to the study of brain states described by EEG signals, including sleep. The variables in these models, representing quantities that are averaged over the millimeter scale, are comparable to the mesoscale measurements of EEG electrodes. More specifically, we choose a cortical model developed most recently in Liley et al. (2002) and Steyn-Ross et al. (1999, 2003). In addition to sleep, it has been used to model epileptic seizures (Kramer et al. 2005), anesthesia (Steyn-Ross et al. 2004; Bojak and Liley 2005), and the transition to seizure due to application of anesthetic agents (Liley and Bojak 2005).

*L*added to represent neuromodulators that regulate the natural sleep cycle, as was done in Steyn-Ross et al. (2005):

*i*to population

*e*. The synaptic currents are functions of local input, e.g. \(N_e^{\beta} \tilde{S}_e\) where

*ξ*

_{1}:

*α*

_{ee}is a constant that determines the variance of the stochastic input. Please refer to Table 1 for further descriptions of all variables and parameters.

Dimensionless variables and parameters of the SPDE cortical model

Symbol | Description | Typical value |
---|---|---|

\(\tilde{h}_{e,i}\) | Spatially averaged soma potential for neuron populations | – |

\(\tilde{I}_{ee,ei}\) | Postsynaptic activation due to excitatory inputs | – |

\(\tilde{I}_{ie,ii}\) | Postsynaptic activation due to inhibitory inputs | – |

\(\tilde{\phi}_{e,i}\) | Long-range (corticocortical) input to | – |

\(\tilde{t}\) | Time (dimensionless) | – |

\(\tilde{x}\) | Space (dimensionless) | – |

Γ | Influence of synaptic input on mean soma potential | 4.6875 × 10 |

\(h^0_{e,i}\) | Reversal potential | 0, 1.0938 |

| Neurotransmitter rate constant | 12.0, 3.6 |

| Inverse length scale for corticocortical connections | 11.2, 11.2 |

| Subcortical input from excitatory population | 25.0, 25.0 |

| Subcortical input from inhibitory population | 25.0, 25.0 |

\(N^{\alpha}_{e,i}\) | Number of distant (corticocortical) connections from excitatory populations to | 3710, 3710 |

\(N^{\beta}_{e,i}\) | Number of local synaptic connections from | 410, 800 |

\(\tilde{g}_{e,i}\) | Slope at inflection point of sigmoid function \(\tilde{S}_e\) | − 29.021, − 19.347 |

\(\tilde{\theta}_{e,i}\) | Inflection point for sigmoid function \(\tilde{S}_e\) | 0.91406, 0.91406 |

For completeness, we will include the full model in our simulations; however, it should be noted that a reduced version would suffice in this case. For example, we will utilize only the temporal evolution of variables, so it would be possible to convert the model to a system of ODEs by removing the spatial derivatives from Eqs. (7) and (8). In addition, the subdivision of local excitatory inputs, represented by \(N^{\beta}_e \tilde{S}_e[\tilde{h}_e]\) in Eqs. (3) and (4), is unnecessary. Making these changes would perhaps reduce the computation time for numerical solutions to the model, but we would not expect them to affect the results.

For the purpose of modeling sleep, we will focus on the parameters *L* and \(\Delta h_e^{rest}\) and the variable \(\tilde{h}_e\). The parameters represent the actions of neuromodulators adenosine and acetylcholine (ACh) that aid in the regulation of the human sleep cycle. Adenosine reflects the activity of the homeostatic drive to sleep, which is modulated by various somnogens. The ACh input into the cortex is a measure of the activity of the various brain stem controllers of sleep. Note that we have not specifically modeled the complex intrinsic interactions between the various brain stem nuclei. In this paper, we are primarily concerned with the interaction of their neuromodulator output with the cerebral cortex and thus model their effects only as extrinsic alterations in ACh.

In general, adenosine acts to reduce the resting potential of excitatory cells, thus making them less likely to fire; ACh does the opposite by raising the resting potential. These changes are represented in the model by \(\Delta h_e^{rest}\), which adds directly to the resting potential of the excitatory population (disguised as a “1” in the dimensionless equations). In addition, ACh decreases the amplitude of the excitatory postsynaptic potential, effectively reducing the synaptic gain. In the model, this corresponds to a reduction in the effect of synaptic currents \(\tilde{I}_{ee}\) and \(\tilde{I}_{ei}\); therefore, the parameter *L* is multiplied by these quantities to simulate a change in synaptic gain. Lastly, as was done in Steyn-Ross et al. (2005), we take the mean excitatory soma potential \(\tilde{h}_e\) to be representative of cortical activity; we will compare this variable to EEG measurements using locally linear embedding.

### 2.2 Model of the human sleep cycle

The mechanisms underlying human sleep and waking are complex; for recent, detailed reviews of the brain stem and hypothalamic control of sleep in thalamo-cortical systems see Fuller et al. (2006, 2007), McCarley (2007), Rosenwasser (2009), Saper et al. (2005a, b). In summary, the wakeful state may be characterized by high levels of activity in aminergic, cholinergic, orexinergic and glutamatergic neuronal populations in the brain stem and hypothalamus. The overall effect is to maintain the thalamo-cortical neurons in a depolarized, active, and continually firing state. These excitatory neurons also inhibit activity in various gamma-amino-butyric-acid(GABA)ergic cell populations, particularly in the ventro-lateral pre-optic area (VLPO), basal forebrain, and in the reticular nucleus of the thalamus. With the build up of homeostatic and circadian pressure to sleep (possibly mediated by various activity-dependent somnogens such as adenosine), the wake-active neurons are inhibited, which then allows the sleep-promoting neurons of the VLPO to start firing and trigger the transition from wakefulness to NREM sleep. This results in quiescence of the aminergic, orexinergic, and cholinergic brain-stem neuromodulator centers; which in turn allows hyperpolarization of the cortico-thalamic systems and hence the burst firing patterns characteristic of slow wave sleep. If these neurons are only moderately hyperpolarized, the EEG is dominated by the sleep spindles and K-complexes characteristic of stage 2 sleep. With more profound hyperpolarization the EEG is dominated by the delta waves of stages 3 and 4 (Steriade and Amzica 1998; Steriade and Timofeev 2001). This progressive slowing of the dominant frequency is captured by measures such as the permutation entropy index (Olofsen et al. 2008). The transition from NREM to REM sleep is associated with cortico-thalamic depolarization caused by activation of cholinergic and glutamatergic brain stem systems (mainly in and near the pedunculo-pontine tegmentum). The neuromodulatory environment of REM sleep differs from the wakeful state in that the amines and orexinergic systems are inactive in REM sleep, but active in the wakefulness; however, this distinction is not explicit in the present model.

*L*and \(\Delta h_e^{rest}\). In order to visualize this, we look at steady-state solutions of

*h*

_{e}(without stochastic input) as

*L*and \(\Delta h_e^{rest}\) are varied; these solutions create what we will refer to as the “sleep manifold” (Fig. 1). Notice that, for most parameter values, there is only one steady state solution. However, in certain cases, there are three solutions (two stable and one unstable), causing the manifold to fold over on itself. This fold is seen on the left side of Fig. 1. In this model, the top branch of solutions on the manifold is intended to be representative of REM sleep. Starting at this point, we can imagine that during sleep, there is a gradual descent to deep slow-wave sleep by following a trajectory down the right side of the manifold where there is only one steady state solution. This happens in a smooth continuous manner. Then the quick transition from slow-wave sleep to REM is simulated by a jump across the fold from the bottom branch of solutions to the top branch. This mimics the rapid transition from deep sleep to REM that is observed in human EEG recordings. This process of gradually moving from REM to NREM sleep and then quickly jumping back to REM represents one sleep cycle in the model.

This model has been previously studied. Steyn-Ross et al. (2005) calculated the EEG total power, fractions of high and low power, and correlation time exhibited by the model at the transition from slow-wave sleep to REM; it was found that they qualitatively matched both human clinical sleep recordings and cortical measurements from a cat. The model was also studied in two spatial dimensions to investigate stable oscillatory states similar to slow-wave sleep, and it was shown that a transition from one state to another can occur due to stochastic fluctuations (Wilson et al. 2005). Lastly, Wilson et al. (2006) interpreted the k-complex as a transient shift from a stable low-firing state to an unstable high-firing state and used this model to demonstrate the mechanism by which the transition may occur.

Because we are interested in comparing this model directly to human EEG recordings, we will use the sleep manifold as a way to generate model “EEG-like” signals. We will choose values of *L* and \(\Delta h_e^{rest}\), find the numerical solution of the model for a given length of time, convert the dimensionless \(\tilde{h}_e\) to mV, and downsample it to match the EEG recordings. By doing this for many different values of *L* and \(\Delta h_e^{rest}\) we can obtain representative signals of every sleep stage.

It has previously been argued that \(\tilde{h}_e\) cannot be directly compared to measurements from cortical surface or scalp electrodes because those measurements are based on extracellular current flow, as opposed to the soma potential. This is important for the modeling of certain cortical phenomena; for example, in performing simulations of feedback control for the suppression of epileptic seizures, the value of the electrode measurement is fed directly back to the cortex to affect \(\tilde{h}_e\), with little or no time delay (Lopour and Szeri 2010). In that case, the relationship between \(\tilde{h}_e\) and the electrode measurement at any given time is very important. However, in the present analysis of EEG data using LLE, we are only interested in matching *scaled features* of the data that are calculated over 30-second intervals. We will not attempt to compare the temporal progression of \(\tilde{h}_e\) directly to the EEG data. The previous work mentioned above has demonstrated a correspondence between \(\tilde{h}_e\) and sleep EEG data with regard to these general features, so we feel confident in using it for our analysis without the addition of a scalp electrode model.

## 3 Locally linear embedding (LLE)

Locally linear embedding is a method of nonlinear dimensionality reduction that was originally introduced in Roweis and Saul (2000). It is useful for visualizing high-dimensional data sets as they would be embedded in a low-dimensional space, and it can often uncover relationships and patterns that are masked by the complexity of the original data set. It has been used to obtain maps of facial expressions and classify handwritten digits (Saul et al. 2003), as well as discriminate between normal and pre-seizure EEG measurements (Ataee et al. 2007). Here we will use LLE to characterize sleep EEG data and the numerical solutions of the cortical model. By embedding both in a two-dimensional space, we will be able to associate traditional EEG sleep stages with the continuous spectrum of states provided by the model.

### 3.1 The algorithm

**X**of size

*D*×

*N*, where each column

**X**

_{i}represents one of the

*N*D-dimensional data points. Then the LLE algorithm consists of three steps:

- 1.
*Calculate the nearest neighbors of each data point***X**_{i}*in the D-dimensional space*. This can be done in several ways; for example, we might choose the*k*closest points based on Euclidian distance, or we may choose only the points within a sphere of a given radius. - 2.
*Determine the best reconstruction of each point using only its nearest neighbors*. Mathematically, this takes the form of a least squares minimization problem:where$$ \min_W \sum\limits^{N}_{i=1} \mid \mathbf{X}_i - \sum\limits^{k}_{j=1} W_{ij} \mathbf{X}_j \mid^2 \;, $$(11)*k*represents the number of nearest neighbors. Our goal is to choose the weights*W*that best reconstruct the original data points in the D-dimensional space, based on the criteria of least-squared error. Because we use only the nearest neighbors, we must have*W*_{ij}= 0 if**X**_{j}is not a neighbor of**X**_{i}. In addition, we guarantee invariance to translations by enforcing ∑_{j}*W*_{ij}= 1. Note that the minimization can be calculated individually for every*i*. - 3.
*Compute the low-dimensional output vectors***Y**_{i}. These are chosen to provide the best global reconstruction using the weights*W*from the previous step. Again, this can be formulated as a least squares minimization:Here we are making the assumption that the weights that give the best reconstruction in$$ \min_{\mathbf{Y}} \sum\limits^{N}_{i=1} \mid \mathbf{Y}_i - \sum\limits^{k}_{j=1} W_{ij} \mathbf{Y}_j \mid^2 \;. $$(12)*D*dimensions will also be the optimal weights in the lower-dimensional space. In this case, the*N*minimization problems are coupled by the elements of**Y**, so they must be solved simultaneously.

A detailed description of the algorithm and several examples are provided in Saul et al. (2003). In addition, a Matlab implementation of LLE is available on the authors’ website (Roweis and Saul 2009); it was used to generate all results presented here.

**X**consists of a random sampling of points from the manifold (Fig. 2(b)), and the LLE output for a reduction to two dimensions is displayed in Fig. 2(c). Here we see that LLE successfully unravels the manifold and uncovers its true 2D nature.

A possible source of confusion with locally linear embedding is the interpretation of output dimensions such as *Y*_{1} and *Y*_{2}. Unlike linear methods such as principal component analysis, LLE does not provide a description of the output vectors in terms of the original *D* dimensions. The elements of **Y** are chosen to give the best *local* reconstructions based on a global minimization problem; this means that the interpretation of **Y** is different for every data point, and it cannot be described by a simple combination of the original dimensions.

### 3.2 Strongly connected components

The use of the LLE algorithm is based on the assumption that the entire data set lies on the same manifold in high-dimensional space. If more than one manifold is present, the locally linear reconstructions will no longer be accurate (imagine, for example, a point with nearest neighbors located on two separate manifolds). Therefore, before using LLE on a data set, we must verify this assumption.

*strongly connected component*as a group of points where the arrows created by nearest neighbor associations allow for travel from every point in the group to every other point in the group (Tarjan 1972).

*When a group of data points is strongly connected, this indicates that they lie on the same manifold*(Saul et al. 2003).

The example in Fig. 3 has two strongly connected components: points 1, 2, 5, and points 3, 4, 6. However, the two groups are not strongly connected together; one can move from the first group to the second through the connection between 4 and 5, but there is no way to get from the second group to the first. Point 7 is not strongly connected to any other point. Therefore, to use LLE on this sample data set, we would remove point 7 and use the algorithm separately on each strongly connected component.

There are several ways to identify the strongly connected components of a data set. The most traditional method involves an algorithm based on depth-first search of the directed graph (Tarjan 1972). An alternative method relies on analysis of the eigenspace that results from the LLE calculations (Polito and Perona 2001). It is also true that choosing the nearest neighbors in a different manner or increasing the value of *k* can change the structure of the strongly connected components. However, for the purposes of this study, we used the MATLAB function dmperm on a matrix containing the nearest neighbor associations for the data set. This function, based on the Dulmage-Mendelsohn decomposition, permutes the rows and columns of a matrix to put it into block diagonal form; by including the fact that every point is a neighbor with itself, we can guarantee that this permutation will be symmetric. As output, dmperm provides the new order of rows and columns and identifies the blocks of the permuted matrix, where each block represents one strongly connected component within the data.

A remark about principal component analysis (PCA) is in order. This is perhaps the most common mode of dimensionality reduction, and it has also been used in the analysis of sleep EEG data (Gervasoni et al. 2004; Jobert et al. 1994; Corsi-Cabrera et al. 2000). However, PCA places the greatest importance on the directions of largest variance and relies on the assumption that the data is best reconstructed by a *linear* combination of the original measurements. While we tried PCA and achieved reasonable results, the nonlinear nature of the sleep manifold suggests that a more sophisticated solution is necessary. In addition, the concept of nearest neighbors on which the LLE algorithm is based enabled improvement in the separation of different sleep stages (see Section 4.4), and it played a crucial role in defining the quantitative relationship between the EEG data and mathematical model, as is discussed in Section 5.2.

## 4 LLE applied to sleep EEG data

Before examining the connection between EEG data and the mathematical model of the sleep cycle, we will first discuss the results of applying LLE to sleep EEG only. After introducing the data sets and our methods, we show that LLE can separate EEG data by sleep stage and provide a continuous representation of sleep depth.

### 4.1 Sleep EEG data

The EEG data used for this analysis was obtained from the Sleep-EDF database (Kemp 2009), which is part of the PhysioBank online resource of physiologic signals for biomedical research (Goldberger et al. 2000). We used four data sets (sc4002e0, sc4012e0, sc4102e0, and sc4112e0), each one consisting of a European data format (EDF) file and a file containing the hypnogram data. They were converted to ASCII format and then imported into Matlab.

The data were gathered in 1989 from healthy males and females between the ages of 21 and 35. Recordings were obtained over the course of one full day and include horizontal electrooculogram (EOG), two channels of EEG (Fpz-Cz and Pz-Oz sampled at 100 Hz), submental-electromyogram (EMG) envelope, oro-nasal airflow, and rectal body temperature. However, we used only the data from the Fpz-Cz EEG electrode pair in our analysis. The hypnogram data was generated via manual scoring according to Rechtschaffen & Kales using the two channels of EEG. For more details on the subjects, recording methods, and sleep staging, please see the full description in Mourtazaev et al. (1995).

### 4.2 LLE input based on EEG features

In order to use the EEG as an input to the LLE function, we need to define our high-dimensional data set. We do this by dividing the signal into non-overlapping windows and calculating both statistical and frequency-based *features* for each one. Therefore each window becomes one high-dimensional data point, where the dimension equals the number of features. Because the data was scored using 30-second epochs, this was a natural choice for the window length. Thus, if we have 100 minutes of EEG data and we calculate six features, we will input 200 six-dimensional points into LLE and seek the embedding in two dimensions.

*Power in different frequency bands*This group of five features consists of total power in the delta (up to 4 Hz), theta (4–7.5 Hz), alpha (7.5–12 Hz), beta (12–26 Hz), and gamma ranges (above 26 Hz).*Total power*This is the total power in all five frequency bands.*Statistical measures*These include variance, skewness, and kurtosis. Whereas the variance captures the spread of the data and is always positive, skewness is a measure of the asymmetry around the sample mean, i.e. negative skewness indicates that more data points lie below the mean than above. Kurtosis is a measure of how prone the distribution is to outliers; a signal with high kurtosis has infrequent large deviations from the mean.*Spindle score*The spindle detector identifies segments of the EEG signal where the difference between consecutive points changes from positive to negative five times in a row, thereby creating two peaks and two troughs. The lag parameter*τ*_{L}defines the number of sample points spanned by each rise or fall within the sought-for spindle, so it can be adjusted to search for these motifs at lower frequencies. We set*τ*_{L}= 5, which allows for detection of 8–12 Hz spindles in data sampled at 100 Hz (with the maximum response occurring for spindles at 10 Hz), and we used a minimum threshold of zero. The overall spindle score indicates the percentage of the signal that was classified as spindle activity. Matlab code for this function is provided in McKay et al. (2010).*Permutation entropy*Similar to the spindle score, the permutation entropy (PE) identifies motifs in the EEG data, such as peaks, troughs, and slopes. The PE has its maximum value when there is an equal distribution of all motifs and its minimum value when only a single motif is present. In this way, it is a measure of the “flatness” or “uncertainty” of the signal. Here we use the composite permutation entropy index (CPEI), which combines the PE with*τ*_{L}= 1 and*τ*_{L}= 2 with a minimum threshold level. In our study, we set the threshold at 1% of the inter-quartile range of the EEG data. Further descriptions of this measure and an associated MATLAB function can be found in Olofsen et al. (2008).The CPEI has been found to be a good measure of anesthetic depth, and the motif-based methods used for permutation entropy and spindle detection are generally robust to noise. This is demonstrated in Olofsen et al. (2008), where the CPEI is calculated for both a time-varying signal and the same signal with added white noise of various magnitudes. As mentioned above, the noise threshold for PE is built into the calculation. These reasons (and the availability of published MATLAB code) led us to choose motif-based methods over more common parametric measures.

*Properties of log power*These four features are based on the log of the power spectral density (PSD), as obtained by Welch’s method. First, we omit the delta and alpha peaks and calculate the slope and offset of a linear fit. We then determine the maximum value of the PSD above the linear estimate in the alpha range (8–17 Hz) and the maximum value of the PSD in the delta range (0.5–4 Hz). These values will generally be large when a prominent peak is present. The code for generating these features was based on a Matlab function found in Leslie et al. (2009).*Power fractions*The low power fraction is obtained by summing the power in the delta and theta ranges and dividing by total power. Similarly, the high power fraction is calculated by summation of the power in the beta and gamma ranges and dividing by total power.

The selection of a subset of features from this list may seem like a difficult task. It is certainly an important one—the use of all 17 features or a “nonsensical” subset will give poor results. However, it is worth noting that there are *many* combinations that result in a satisfactory separation between sleep stages in the LLE embedding. While each one may be slightly different, there will be a large number of high quality with respect to discrimination.

### 4.3 Automated ranking of feature sets

- 1.
*How many nearest neighbors should we include?*In other words, what is the value of*k*? The LLE embedding will be stable over a range of values; we generally expect that*k*will be greater than the number of output dimensions and smaller than the original number of dimensions*D*(Saul et al. 2003). - 2.
*What should be the dimensionality of the LLE output space?*A nice property of the LLE algorithm is that each dimension is preserved as additional dimensions are added. Therefore, if we look at the results in two dimensions and do not achieve the desired mapping, we can add a third dimension without affecting the first two. - 3.
*Which combination of features should we use?*Employing all 17 features in our LLE analysis does not guarantee good separation between sleep stages because some of the features may not show consistent variation as the sleep depth changes. In addition, some features, such as the variance and the power in the delta band, show similar trends; we may achieve better results by eliminating these redundancies.

While we were able to identify many effective feature combinations through educated guesswork, we wanted to evaluate the utility of LLE as a method of sleep staging by identifying the best possible results. In this case, the “best” results are those that provide a large separation between sleep stages, especially between REM and deep slow-wave sleep. Because testing each combination of the features is an onerous task, e.g. choosing six features from a pool of 17 results in 12376 combinations, we developed an algorithm to evaluate the results automatically. It first identifies two groups of points: those marked as REM in the hypnogram and those determined to be stage 4. It then tracks two parameters based on the separation between those two groups of data points as they are embedded in the LLE output space.

*μ*and

*σ*are the mean and standard deviation, respectively. This is based on the assumption that the best separation occurs when the distance between the means is large and the total standard deviation is small. We perform this calculation in both the

*Y*

_{1}and

*Y*

_{2}directions and combine those measurements using the 2-norm to obtain the first parameter:

The second parameter *B* uses the concept of nearest neighbors to evaluate separation; for example, if the stage 4 data points have only other stage 4 points as nearest neighbors, then we can infer that they are completely separated from the other sleep stages. More specifically, it measures the number of stage 4 points with REM points as nearest neighbors and divides that by the total number of stage 4 points. If the stage 4 group is isolated, we will have *B* = 0.

We determined the values of *A* and *B* for all possible combinations of six features. There were 267 feature sets where *A* exceeded a threshold of 90% separation in each direction: \(A > \sqrt{90^2 + 90^2}\). We then identified the 267 feature sets with the lowest values of *B*. By finding the combinations that were common to both groups, we identified the 11 best feature sets. Visual inspection of the LLE results for these combinations confirmed the desired separation between REM and stage 4 sleep. Note that all 11 of these feature combinations provided results with *B* = 0.

### 4.4 Separation of sleep stages via LLE

*k*= 13) are displayed in Fig. 5(a). Every point in this figure represents a 30-second window of EEG data, and the color and symbol represent the sleep stage as determined by manual scoring. Here we see a very clear separation between the REM points (red circles) and those from stage 4 (blue stars), as required by our criteria for the automatic selection of the feature set. Stages 1 through 3 are located between those two groups and are arranged by sleep depth. In this example, we see a general trend of increasing sleep depth as we move to the upper right corner of the space. In addition, this low-dimensional embedding provides results with a smooth temporal progression. This is demonstrated by Fig. 5(b), where the LLE results from Fig. 5(a) are plotted versus time. In this example, the gradual transition to deep stage 4 sleep and the quick transition to REM are visible in the plot of

*Y*

_{1}.

*Y*

_{1}–

*Y*

_{2}output space when LLE is performed on all 178 data points. Here, the feature set consisted of power in the delta, theta, and gamma bands, total power, maximum value of the PSD in the alpha band, and the low power fraction. Again, each point represents 30-seconds of EEG data, and the symbol (and color) are assigned based on its designated sleep stage. While there is some visible separation between the stages, the overall trend is unclear. On the other hand, Fig. 6(b) shows the results when LLE is applied to the largest strongly connected component within the data. This component was identified as described in Section 3.2, and all other points were removed before using LLE. This greatly improves the results; the data points are spread further apart, and we see a grouping of sleep stages similar to Fig. 5. Sometimes the removal of weakly connected points has a very small impact on the results, but situations like this make it a necessity. The significant improvement for this feature set allowed it to be counted as one of the “best” 11 results discovered by the automatic algorithm.

3 points from waking (epochs 1,620, 1,717, and 1,718)

2 points from stage 2 (epochs 1,619 and 1,624)

2 points from stage 3 (epochs 1,712 and 1,713)

1 point from stage 4 (epoch 1,665)

## 5 Integration of EEG data and the model sleep cycle

Thus far, we have shown that LLE is capable of distinguishing between sleep stages using only one channel of EEG and that the embedding exhibits a smooth progression over time. However, remember that our original goal was to find the relationship between EEG data and the mathematical model of the sleep cycle. Here we accomplish this by applying LLE *simultaneously* to EEG data and simulated data from the model.

### 5.1 Model data set

To generate the model data set, we place a grid of points on the sleep manifold (Fig. 1) and obtain the numerical solution of the cortical model at each one. We vary *L* over the interval [0.5, 2] in increments of 0.1 and \(\Delta h_e^{rest}\) over [−5, 5] in increments of 0.5. This gives us a total of 336 model signals for analysis; we then remove the initial transients and characterize each signal based on a subset of the features described in Section 4.2. In this way, the nonlinear sleep manifold is turned into “EEG-like” signals which are converted to high-dimensional data points for use with LLE.

For the model data set, the length of each signal is 10 s (as opposed to the 30-second windows used for the EEG data). We are able to use this shorter time because we can choose parameters in the model to simulate a stationary brain state, i.e. we can use constant values of *L* and \(\Delta h_e^{rest}\). A test of the feature calculations for various window lengths indicated that, in many cases, the signal properties were stationary for windows greater than five seconds. Certain parts of the sleep manifold had transients lasting roughly 10 seconds.

*α*, which defines the variance of the stochastic input to the model cortex in Eq. (10), does not reproduce this behavior. Therefore, we varied the value of

*α*as we moved in the

*L*-\(\Delta h_e^{rest}\) space. More specifically, we based it on the sleep manifold. Define \({\vec\mu}_e\) to be a matrix of the steady-state values of

*h*

_{e}after they have been shifted and scaled to have a range of [0, 1]. Then we define a matrix of

*α*values:

*μ*

_{e}∼1)) will have stochastic inputs of

*α*, while the lower NREM section (where

*μ*

_{e}∼0) will have inputs of variance 8

*α*. This stochastic input allowed us to successfully reproduce the desired range of variances in the model signals.

*L*and \(\Delta h_e^{rest}\) that covers the sleep manifold. For reference, the steady state values of

*h*

_{e}on the sleep manifold are shown in Fig. 7(a); note that this is similar to viewing Fig. 1 from the top and coloring the points based on their height. The lowest value is plotted in white and the highest value in black.

*L*and large values of \(\Delta h_e^{rest}\). As a means of comparison, the same five features were applied to a sample of EEG data and are displayed in Fig. 8.

### 5.2 Application of LLE to a hybrid data set

*k*= 14, and only three points were removed by analysis of the strongly connected components.

In Fig. 9(a), the model data is represented by dots, where the color denotes the steady-state value of *h*_{e} associated with that point; in general, the red points represent the REM portion of the manifold, while the blue points represent NREM. On the other hand, the sleep EEG data points are rings, where the color is chosen based on sleep stage. Note that, for clarity, only the first 500 EEG data points were included in the figure.

The most important aspect of this result is that the EEG data points and model points overlap each other in the *Y*_{1}–*Y*_{2} output space. This implies that model points have EEG data points as nearest neighbors (and vice versa) and verifies that LLE has associated the two data types with one another. Without fidelity of the model and careful choice of EEG features, we would have likely obtained a result with one cluster of EEG points and a completely separate cluster of model points. Further, LLE appears to have matched the sleep stages between the two data types—the deepest sleep (blue for both EEG and model) appears in the lower left corner, and REM (red) is embedded in a vertical band where *Y*_{1} is in the range [ − 1, 0]. The separation between sleep stages can be seen more clearly in Fig. 9(b), which displays only the EEG data points from Fig. 9(a). Here we see that the stages are grouped; even the REM points and the awake points are separated, despite the fact that their EEG traces are characterized by very similar features. If we were to plot the *Y*_{1} and *Y*_{2} values of the EEG data points as they evolve in time, we would see a very similar result to the one in Fig. 5(b). Here, the *Y*_{1} direction appears to be an approximate indicator of sleep depth.

### 5.3 Connection to the theoretical sleep manifold

So far, we have seen that LLE provides a qualitatively similar embedding for REM and NREM points in both EEG measurements and simulated model data. However, we would like to quantify this relationship. In other words, we would like to associate each EEG data point with a position on the sleep manifold in the *L*-\(\Delta h_e^{rest}\) space. This will allow us to infer the model trajectory of a subject’s actual brain state as it moves along the manifold.

To do this, we use the results in Fig. 9(a) and again turn to the concept of nearest neighbors. Using *k* = 14, we calculate the nearest neighbors of every point in the *Y*_{1}–*Y*_{2} space. We then identify *model* points that are nearest neighbors of *EEG* data points. Each one of those model points has an associated position on the sleep manifold; we assume that the *L*-\(\Delta h_e^{rest}\) positions of the model nearest neighbors will be the most closely associated positions for the EEG data point.

*L*-\(\Delta h_e^{rest}\) for the sleep stage of the EEG point. We then create grayscale plots of the total counts, where white indicates that a location was never a nearest neighbor of that sleep stage and black indicates that it was a nearest neighbor many times.

For example, (i)–(vi) in Fig. 10(a) correspond to awake, REM, and stages 1–4, respectively. The thick vertical line at *L* = 1.2 marks the approximate location of the fold. As we move from REM to the deeper stages of sleep, we can see a continuous progression along the sleep manifold. In this example, REM and stage 1 sleep generally associate themselves with locations on the right half of the manifold (and a small piece of the lower left corner). Then in stage 2 sleep, we move to the left half of the manifold; here, we see two distinct groups of points, with a majority landing in the group that borders the area associated with REM and Stage 1. Stage 3 is associated with a cluster of points starting in the upper left-hand corner and approaching the fold. Stage 4 continues this progression and is located in a band of points leading up to the fold.

We can then create a composite plot that combines all five sleep stages. We neglect the waking points for this task because the current model does not effectively distinguish between the waking and REM states, although this is certainly an issue that may be addressed in the future. For every location on the manifold, we determine which sleep stage it was most closely associated with and color it accordingly. To do this, we scale the number of nearest neighbors for each stage by the total number for that stage; then, for every position on the manifold, we choose the stage with the highest value. This accounts for the fact that the subjects do not spend an equal amount of time in each sleep stage (otherwise, more time spent in a certain stage would lead to more nearest neighbors and a greater likelihood of dominating this composite plot). As in previous figures, we use red for REM, yellow for stage 1, green for stage 2, cyan for stage 3, and blue for stage 4. The intensity of the color is assigned based on the percentage of times it was associated with that sleep stage. Suppose a certain point on the manifold was a neighbor of stage 2 twelve times, a neighbor of stage 1 five times, and a neighbor of REM three times. We would color that point green to indicate stage 2 sleep, and its saturation value would be 12/(12 + 5 + 3) = 0.6. In other words, the intensity of the color is a “confidence” measure; the more saturated the color, the more closely it is associated with that sleep stage. The composite figure for the data in Fig. 10(a) is shown in Fig. 10(b).

### 5.4 Inclusion of additional data sets

**x**, this is a three-step process (Saul et al. 2003):

- 1.
Find the

*k*nearest neighbors of each new data point among the points in the existing embedding. - 2.
Compute the best linear reconstruction

*w*_{j}of each new point using only its nearest neighbors. Again, we enforce the constraint that the weights used in the reconstruction sum to one: ∑_{j}*w*_{j}= 1. - 3.
Calculate the output for the new data points:

**y**= ∑_{j}*w*_{j}**Y**_{j}, where**Y**contains the original embedding coordinates and*j*cycles through the neighbors of**x**.

This picture may be very useful in the analysis of seizures during sleep. Imagine taking another new sleep EEG data set, this time from an epileptic subject, and projecting it onto these results. By following the location in *L*-\(\Delta h_e^{rest}\) as the subject sleeps, we can get an idea of the sleep stage as it is traditionally defined, and *we can also identify that stage in more detail and detect nearness to transitions between stages*. The grid of points on the sleep manifold essentially gives us descriptions of 336 different brain states associated with sleep. We expect that future research will identify the locations on the sleep manifold where seizures are most likely to occur. With that knowledge, if the sleep state characterization is done continuously while the subject is sleeping, this may allow for the prediction (and possibly prevention) of seizures.

We emphasize the fact that the coloring in the composite pictures (Figs. 10(b), 11, and 12(b)) is based on the subjective scoring of sleep data. The reliability of categorizing individual epochs of data has been reported at 73% for scorers from different labs (Norman et al. 2000) and as high as 90% for scorers from the same lab (Whitney et al. 1998). It has also been shown that reliability varies by sleep stage, with stage 2 having the highest level of agreement between scorers (78.3%) and stage 1 having the lowest (41.8%) (Norman et al. 2000). This certainly affects our results. For example, imagine if some of the points scored as REM that landed in the range 1.6 < *L* < 2 on the sleep manifold were instead scored as stage 1. Then the right side of the composite picture would be completely yellow and the region associated with stage 1 would be more clear. Therefore, the composite pictures should be seen as “guides” to tie the analysis back to the traditional definitions of the sleep stages, not as the ultimate truth. As mentioned in the previous paragraph, we are most interested in the *position* on the sleep manifold, the trajectory that results as the subject sleeps, and the relationship of this trajectory to the regions where seizures may be most likely to occur.

## 6 Summary

Mathematical models represent an opportunity for exploration and prediction. In this case, a model of the human sleep cycle creates the possibility for a more detailed description of sleep states, with application to the prediction and analysis of seizures during sleep. The first step in such an endeavor is always to connect the model to the real world through experimental data.

Here we have used locally linear embedding to directly associate human sleep EEG data with the mathematical model. We first showed that LLE has the ability to distinguish between sleep stages when applied to EEG data alone. This analysis can reliably separate REM and NREM sleep data and provide a smooth temporal progression through the various stages of sleep. We also presented the concept of strongly connected components as a method of automatic outlier rejection for EEG data and discussed a method for the selection of EEG features used in the analysis. Then, by using LLE on a hybrid data set containing both sleep EEG and signals generated from the mathematical sleep cycle, we were able to quantify the relationship between the model and the data. This enabled us to take any sample of sleep EEG data and associate it with a position among the continuous range of sleep states provided by the model. In addition, this approach yields consistent results for various subjects over a full night of sleep and can be done online as the subject sleeps. This suggests a wide range of possibilities for future investigation.

## Acknowledgements

This work was supported through a National Science Foundation Graduate Research Fellowship to B. A. Lopour. It was also supported, in part, by a Mary Elisabeth Rennie Epilepsy and Epilepsy-related Research Grant. We extend a special thanks to Kelly Clancy and Albert Kao for work which served as the starting point for this project, done as part of a course in computational neuroscience at UC Berkeley taught by Professor B. A. Olshausen.

**Open Access**

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.