Blood Glucose Prediction with Variance Estimation Using Recurrent Neural Networks
- 41 Downloads
Many factors affect blood glucose levels in type 1 diabetics, several of which vary largely both in magnitude and delay of the effect. Modern rapid-acting insulins generally have a peak time after 60–90 min, while carbohydrate intake can affect blood glucose levels more rapidly for high glycemic index foods, or slower for other carbohydrate sources. It is important to have good estimates of the development of glucose levels in the near future both for diabetic patients managing their insulin distribution manually, as well as for closed-loop systems making decisions about the distribution. Modern continuous glucose monitoring systems provide excellent sources of data to train machine learning models to predict future glucose levels. In this paper, we present an approach for predicting blood glucose levels for diabetics up to 1 h into the future. The approach is based on recurrent neural networks trained in an end-to-end fashion, requiring nothing but the glucose level history for the patient. Our approach obtains results that are comparable to the state of the art on the Ohio T1DM dataset for blood glucose level prediction. In addition to predicting the future glucose value, our model provides an estimate of its certainty, helping users to interpret the predicted levels. This is realized by training the recurrent neural network to parameterize a univariate Gaussian distribution over the output. The approach needs no feature engineering or data preprocessing and is computationally inexpensive. We evaluate our method using the standard root-mean-squared error (RMSE) metric, along with a blood glucose-specific metric called the surveillance error grid (SEG). We further study the properties of the distribution that is learned by the model, using experiments that determine the nature of the certainty estimate that the model is able to capture.
KeywordsRecurrent neural networks Blood glucose prediction Type 1 diabetes
Our future will be recorded and quantified in unprecedented temporal resolution. A rapidly increasing variety of variables gets stored, describing activities we engage in as well as physiological and medical phenomena. One example is the increasingly wide adoption of continuous blood glucose monitoring systems (CGM) which has given type 1 diabetics (T1D) a valuable tool for closely monitoring and reacting to their current blood glucose levels and trends. CGM data helps patients manage their insulin distribution by providing an informative source of data to act upon. CGM availability has also been of crucial importance for the development and use of closed-loop systems such as OpenAPS . Blood glucose levels adhere to complex dynamics that depend on many different variables (such as carbohydrate intake, recent insulin injections, physical activity, stress levels, the presence of an infection in the body, sleeping patterns, hormonal patterns, etc) [4, 9]. This makes predicting the short-term blood glucose changes (up to a few hours) a challenging task, and developing machine learning (ML) approaches an obvious approach for improving patient care. However, acquiring domain expertise, understanding sensors, and hand-crafting features is expensive and not easy to scale up to further applications. Sometimes natural, obviously important and well-studied variables (e.g., caloric intake for diabetics) might be too inconvenient to measure for end-users. On the other hand, deep learning approaches are a step towards automated machine learning, as features, classifiers and predictors are simultaneously learned. Thus, they present a possibly more scalable solution to the myriad of machine learning problems in precision health management resulting from technology changes alone.
It is feasible to predict future glucose levels from glucose levels alone.
Appropriate models can be trained by non-experts without feature engineering or complicated training procedures.
The proposed model can quantify the uncertainty in its predictions to alert users to the need for extra caution or additional input.
Our method was trained and evaluated on the Ohio T1DM dataset for blood glucose level prediction (see  for details).
2 Modeling Blood Glucose Levels Using Recurrent Neural Networks
The output vector from the final LSTM cell (see ht in Fig. 1) in the sequence is fed through a fully connected neural network with two hidden dense layers and one output layer. The hidden layers consist of 512 and 256 neurons respectively, with rectified linear activations and a dropout of 20% and 30% respectively. The dropout layers mitigate over-fitting the model to the training data. The output layer consists of two neurons: one with a linear activation and one with an exponential activation.
Physiological Loss Function
We also trained the model with a glucose-specific loss function , which is a metric that combines the mean squared error with a penalty term for predictions that would lead to contraindicated interventions possibly leading to clinically critical situations.
2.1 Preliminary Study
Preliminary results from this study were presented at The 3rd international workshop on knowledge discovery in healthcare data at ICML/IJCAI 2018 . However, since the preliminary workshop paper, the proposed model has been further refined by a more thorough exploration of hyperparameters and changes to the model design (such as the activation functions), and the results have consequently improved. This paper also includes a more thorough analysis, such as surveillance error grid analysis and an investigation of the variance predictions using controlled synthetic data. The model in the current study is trained on all available training data whereas the preliminary study considered models trained specifically for one patient at a time.
2.2 Experimental Setup
The number of blood glucose level measurements that are used as training and testing data for each patient in the Ohio T1DM dataset for blood glucose level prediction
There are other data self-reported by the patients such as meal times with carbohydrate estimates; times of exercise, sleep, work, stress, and illness; and measures of heart rate, galvanic skin response, skin temperature, air temperature, and step count. In this work, we consider the problem of predicting future blood glucose levels using only previous blood glucose level measurements. The only preprocessing done on the glucose values is scaling by 0.01 as in  to get the glucose values into a range suitable for training.
For all patients, we take the first 60% of the data and combine it into a training set, we take the following 20% of the data and combine it into a validation dataset used for early stopping, and we choose the hyperparameters by the root-mean-squared error performance on the last 20% of the data.
We then choose the learning rate and the batch size by fixing the number of LSTM units and the amount of history used and instead vary the learning rate between 10− 3 and 10− 5 and the batch size between 128 and 1024. The converged models give approximately the same validation loss for different learning rates and batch size, but a learning rate of 10− 3 and a batch size of 1024 leads to faster convergence and is therefore chosen.
The final models were trained using 60 min of glucose level history for predictions 30 and 60 min into the future. The setup for the final training was to train on the first 80% of the glucose level training data from all patients, and do early stopping on the last 20%. The final models were trained with Adam optimizer with a learning rate of 10− 3, a batch size of 1024, a maximum of 10,000 epochs, and an early stopping criterion set to 200 epochs. We train 100 models with different random initializations of the parameters and report the mean evaluation score for all 100 models on the test data. A link to the source code of the model and the training scripts has been provided in Appendix A.
The final models were evaluated on the officially provided test partition of the dataset. Root-mean-squared error (RMSE) and surveillance error scores are reported. Each CGM value in the test set is considered a prediction target provided that it is preceded by enough CGM history. The number of missing predictions depends on the number of gaps in the data, i.e., the number of pair-wise consecutive measurements in the glucose level data where the time step is not exactly five minutes. We do not interpolate or extrapolate to fill the missing values since it is unclear how much bias this would introduce, but instead only use data for which it is possible to create the (x, y) pairs with a given glucose history, x, and regression target, y, for a given prediction horizon. As a result, we make predictions for approximately 90% of the test data. The discarded test-points are not counted in the evaluation.
In our experimental setup, training of the model could be performed on a commodity laptop. The model is small enough to fit in the memory of, and be used on mobile devices (e.g., mobile phones, blood glucose monitoring devices). Training could initially be performed offline and then incremental training would be light enough to allow for training either on the devices or offline.
Mean and standard deviation of the root-mean-squared error (RMSE) per patient over 100 different random initializations and the mean over all patients for predicting glucose levels 30 respectively 60 min into the future
18.773 ± 0.179
33.696 ± 0.365
15.959 ± 0.374
28.468 ± 0.834
18.538 ± 0.106
31.337 ± 0.210
17.961 ± 0.192
29.012 ± 0.169
21.675 ± 0.218
33.823 ± 0.268
20.294 ± 0.107
32.083 ± 0.182
The glucose level of patient 575 is harder to predict than the glucose level for patient 570, as seen in Table 2 where the mean RMSE for patient 570 is 15.959 and the mean RMSE for patient 575 is 21.675. We observe that patient 575 has higher glucose variability than patient 570. The percentage of first differences greater than 10 mg/dl/5m or lower than − 10 mg/dl/5m are 7.3% for patient 575 and 3.0% for patient 570 in the test data. Abnormal rates of change are potentially harder to predict, which may partially explain why the performance is lower on patient 575 than on patient 570.
Surveillance Error Grid
Note that the criterion is only defined for blood glucose concentrations up to 600 mg/dl, which is the limit of most CGMs and any model that predicts values outside of this region should be discarded or constrained.
Results individually per patient and averages in predicting glucose levels with a 30- and 60-min prediction horizon respectively
0.178 ± 0.003
0.331 ± 0.003
0.105 ± 0.002
0.195 ± 0.004
0.177 ± 0.002
0.291 ± 0.002
0.176 ± 0.002
0.293 ± 0.002
0.224 ± 0.004
0.389 ± 0.005
0.256 ± 0.003
0.396 ± 0.003
In this paper, we have proposed a recurrent neural network model that can predict blood glucose levels in type 1 diabetes for horizons of up to 60 min into the future using only blood glucose level as inputs. We achieve results comparable to state-of-the-art methods on the standard Ohio T1DM dataset for blood glucose level prediction.
Our results suggest that end-to-end machine learning is feasible for precision health management. This allows the system to learn all internal representations of the data, and reduces the human effort involved—avoiding labor-intensive prior work by experts hand-crafting features based on extensive domain knowledge.
Our model gives an estimate of the standard deviation of the prediction. This is a useful aspect for a system which will be used by CGM users for making decisions about the administration of insulin and/or caloric intake. The predicted standard deviation can also be a useful signal for downstream components in a closed-loop system, making automatic decisions for a patient. The results in Fig. 3 show the predicted standard deviation for patient 570 and patient 575, the ones where the model is the most and the least successful in prediction accuracy, respectively. One principal problem is that disambiguating between intra-patient variation and sensor errors is unlikely to be feasible.
Physiological Loss Function
To our surprise, we did not see improvements when using a physiologically motivated loss function  for training (results not shown). This is essentially a smoothed version of the Clarke error grid . Of course, our findings are not proof that such loss functions cannot improve results. Possibly a larger scale investigation, exploring in particular a larger area of the parameter space and different training regimes, might provide further insights. Penalizing errors for hypo- or hyper-glycemic states should lead to better real-world performance, as we observed comparatively larger deviations in minima and maxima. One explanation for that is the relative class imbalance, as extrema are rare. This could be countered with data augmentation techniques.
Even though the different patients pose varying challenges for the prediction task (see Fig. 2), we obtain the best result when training our model on the training data from all patients at once. This suggests that there are patterns governing blood glucose variability that can generalize between different patients, and that the model benefits from having access to more data.
There are gaps in the training data with missing values. Most of the gaps are less than 10 h, but some of the gaps are more than 24 h. The number of missing data points account for roughly 23 out of 263 days of the total amount of patient data or 9% of the data. The gaps could be filled using interpolation, but it is not immediately clear how this would affect either the training of the models, or the evaluation of the models since this would introduce artificial values. Filling a gap of 24 h using interpolation would not result in realistic data. Instead, we have chosen not to fill the gaps with artificial values and limit our models to be trained and evaluated only on real data. This has its own limitations since we can only consider prediction targets with enough glucose level history, and therefore, not predict the initial values after a gap, but the advantage is that model training and evaluation is not biased by the introduction of artificial values.
Additional Patient Data
As mentioned in the description of the dataset, there are other data self-reported by the patients such as meal times with carbohydrate estimates, times of exercise, sleep, work, stress, and illness; and measures of heart rate, galvanic skin response, skin temperature, air temperature and step count. From the results in this work, we conclude that a simple setup using only CGM history obtains results that are on par with more complex solutions that do incorporate more features. It is well documented that the additional features do affect blood glucose dynamics but the dependencies may be more subtle and complex and thus harder to learn. This motivates further work to develop models that can leverage the additional information and make more accurate predictions.
5 Related Work
Early work on predicting blood glucose levels from CGM data include Bremer, et al. , who explored the predictability of data from CGM systems, and showed how you can make predictions based on autocorrelation functions. Sparacino, et al.  proposed a first-order auto-regressive model.
Wiley  proposed using support vector regression (SVR) to predict blood sugar levels from CGM data. They report RMSE of 4.5 mg/dl, but this is using data that was aggressively smoothed using a regularized cubic spline interpolation. Bunescu, et al.  extended this work with physiological models for meal absorption dynamics, insulin dynamics, and glucose dynamics to predict blood glucose levels 30 and 60 min into the future. They obtained a relative improvement of about 12% in prediction accuracy over the model proposed by Wiley. The experiments in  is performed on non-smoothed data.
There have been approaches using neural networks to predict blood glucose levels. Perez, et al.  presented a feed-forward neural network (FFNN) taking CGM history as input, and predicting the level 15, 30, and 45 min into the future. RMSE accuracy for 30-min predictions is similar to those of . Mougiakakou et al.  showed that RNNs can be used to predict blood glucose levels from CGM data. They evaluated their method on four different children with type 1 diabetes, and got some promising results. On average, they reported an RMSE accuracy of 24.1 mg/dl.
Some papers have incorporated additional information (e.g., carbohydrate/meal intake, insulin injections, etc). Pappada et al.  proposed an FFNN taking as input CGM levels, insulin dosages, metered glucose levels, nutritional intake, lifestyle, and emotional factors. Despite having all this data at its disposal, the model makes predictions 75 min into the future with an RMSE score of 43.9 mg/dl. Zecchin et al.  proposed a neural network approach in combination with a first-order polynomial extrapolation algorithm to produce short-term predictions on blood glucose levels, taking into account meal intake information. The approach is evaluated both on simulated data, and on real data from 9 patients with Abbott FreeStyle Navigator. None of the above-mentioned approaches have the ability to output a confidence interval.
A problem when modeling continuous outputs trained using least squares as a training criterion is that the model tends to learn a conditional average of the targets. Modeling a distribution over the outputs may limit this problem and make training more stable. Mixture density networks were proposed by . By allowing the output vector from a neural network model to parameterize a mixture of Gaussians, they manage to learn a mapping even when the targets are not unique. Besides enabling learning stability, this also allows the model to visualize the certainty of its predictions. A similar approach was used together with RNNs in , to predict the distribution of next position for a pen during handwriting.
The release of the Ohio dataset  in combination with The blood glucose level prediction challenge (BGLP) at The workshop on knowledge discovery in healthcare data (KDH) 2018, spurred further interest on blood glucose prediction models. At the workshop, a preliminary version of this study was presented . While a challenge was formulated, no clear winner could be decided, because of differences in the evaluation procedure. The results listed below cannot directly be compared to the results in this paper due to these differences. However, they all refer to predictions made with a 30-min horizon. While our study has focused on predicting the blood glucose levels using only the CGM history as input, all methods below use more features provided in the dataset such as carbohydrate intake and insulin distribution, and none of them gives an estimate of the uncertainty.
Chen et al.  used a recurrent neural network with dilations to model the data. Dilations allow a network to learn hierarchical structures and the authors chose to use the CGM values, insulin doses, and carbohydrate intake from the data, resulting in an average RMSE of 19.04 mg/dl. Xie et al.  compared autoregression with exogeneous inputs (ARX) with RNNs and convolutional neural networks (CNNs), and concluded that the simpler ARX models achieved the best scores on the Ohio blood glucose data, with an average RMSE of 19.59 mg/dl. Contreras et al.  used grammatical evolution (GE) in combination with feature engineering to search for a predictive model, obtaining an average RMSE of 24.83 mg/dl. Bertachi et al.  reported an average RMSE of 19.33 mg/dl by using physiological models for insulin onboard, carbohydrates onboard, and activity onboard, which are fed as features to a feed-forward neural network. Midroni et al.  employed XGBoost with a thorough investigation of feature importance and reported an average RMSE of 19.32 mg/dl. Zhu et al.  trained a CNN with CGM data, carbohydrate intake, and insulin distribution used as features and obtained an average RMSE of 21.72 mg/dl.
In this paper, we presented a deep neural network model that learns to predict blood glucose levels up to 60 min into the future. The model parameterize a univariate Gaussian output distribution, facilitating an estimate of uncertainty in the prediction. Our results make a clear improvement over the baseline, and motivate future work in this direction.
However, it is clear that the field is in desperate need of larger data sets and standards for the evaluation. Crowd sourcing from patient associations would be one possibility, but differences in sensor types and sensor revisions, life styles, and genetic mark-up are all obvious confounding factors. Understanding sensor errors by measuring glucose level in vivo, for example in diabetes animal models, with several sensors simultaneously would be very insightful, and likely improve prediction quality. Another question concerns preprocessing in the sensors, which might be another confounding factor in the prediction. While protection of proprietary intellectual property is necessary, there has been examples, e.g., DNA microarray technology, where only a completely open analysis process from the initial steps usually performed with vendor’s software tools to the final result helped to realize the full potential of the technology.
Open access funding provided by RISE Research Institutes of Sweden. The authors would like to thank Christian Meijner and Simon Persson who performed early experiments in this project.
Compliance with Ethical Standards
Conflict of Interests
The authors declare that they have no conflict of interest.
- 2.Bertachi A, Biagi L, Contreras I, Luo N, Vehi J (2018) Prediction of blood glucose levels and nocturnal hypoglycemia using physiological models and artificial neural networks. In: 3rd International workshop on knowledge discovery in healthcare data, KDH@ ICML/IJCAI 2018, 13 July 2018, pp 85–90Google Scholar
- 3.Bishop CM (1994) Mixture density networks. Tech. rep., CiteseerGoogle Scholar
- 5.Bunescu R, Struble N, Marling C, Shubrook J, Schwartz F (2013) Blood glucose level prediction using physiological models and support vector regression. In: 2013 12th International conference on machine learning and applications (ICMLA), vol 1. IEEE, pp 135–140Google Scholar
- 6.Chen J, Li K, Herrero P, Zhu T, Georgiou P (2018) Dilated recurrent neural network for short-time prediction of glucose concentration. In: 3rd International workshop on knowledge discovery in healthcare data, KDH@ ICML/IJCAI 2018, 13 July 2018, pp 69–73Google Scholar
- 8.Contreras I, Bertachi A, Biagi L, Vehi J, Oviedo S (2018) Using grammatical evolution to generate short-term blood glucose prediction models. In: 3rd International workshop on knowledge discovery in healthcare data, KDH@ ICML/IJCAI 2018, 13 July 2018, pp 91–96Google Scholar
- 10.Favero SD, Facchinetti A, Cobelli C (2012) . A glucose-specific metric to assess predictors and identify models 59(5):1281–1290Google Scholar
- 11.Graves A (2013) Generating sequences with recurrent neural networks. arXiv:1308.0850
- 16.Marling C, Bunescu R (2018) The OhioT1DM dataset for blood glucose level prediction. In: The 3rd International workshop on knowledge discovery in healthcare data. Stockholm, Sweden. CEUR proceedings in press, available at http://smarthealth.cs.ohio.edu/bglp/OhioT1DM-dataset-paper.pdf
- 17.Martinsson J, Schliep A, Eliasson B, Meijner C, Persson S, Mogren O (2018) Automatic blood glucose prediction with confidence using recurrent neural networks. In: 3rd International workshop on knowledge discovery in healthcare data, KDH@ ICML/IJCAI 2018, 13 July 2018, pp 64–68Google Scholar
- 18.Midroni C, Leimbigler P, Baruah G, Kolla M, Whitehead A, Fossat Y (2018) Predicting glycemia in type 1 diabetes patients: experiments with xg-boost. In: 3rd International workshop on knowledge discovery in healthcare data, KDH@ ICML/IJCAI 2018, 13 July 2018, pp 79–84Google Scholar
- 19.Mirshekarian S, Bunescu R, Marling C, Schwartz F (2017) Using LSTMs to learn physiological models of blood glucose behavior. In: Proceedings of the annual international conference of the IEEE engineering in medicine and biology society, EMBS, pp 2887–2891, DOI https://doi.org/10.1109/EMBC.2017.8037460
- 20.Mougiakakou SG, Prountzou A, Iliopoulou D, Nikita KS, Vazeou A, Bartsocas CS (2006) Neural network based glucose-insulin metabolism models for children with type 1 diabetes. In: Engineering in medicine and biology society, 2006. EMBS’06. 28th annual international conference of the IEEE. IEEE, pp 3545–3548Google Scholar
- 24.Wiley MT (2011) Machine learning for diabetes decision support, pp 55–72Google Scholar
- 25.Xie J, Wang Q (2018) Benchmark machine learning approaches with classical time series approaches on the blood glucose level prediction challenge. In: 3rd International workshop on knowledge discovery in healthcare data, KDH@ ICML/IJCAI 2018, 13 July 2018, pp 97–102Google Scholar
- 26.Zecchin C, Facchinetti A, Sparacino G, De Nicolao G, Cobelli C (2012) Neural network incorporating meal information improves accuracy of short-time prediction of glucose concentration, (6): 1550–1560Google Scholar
- 27.Zhu T, Li K, Herrero P, Chen J, Georgiou P (2018) A deep learning algorithm for personalized blood glucose prediction. In: 3rd International workshop on knowledge discovery in healthcare data, KDH@ ICML/IJCAI 2018, 13 July 2018, pp 64–78Google Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.