Skip to main content

A Kernel to Exploit Informative Missingness in Multivariate Time Series from EHRs

  • Chapter
  • First Online:
Explainable AI in Healthcare and Medicine

Part of the book series: Studies in Computational Intelligence ((SCI,volume 914))

Abstract

A large fraction of the electronic health records (EHRs) consists of clinical measurements collected over time, such as lab tests and vital signs, which provide important information about a patient’s health status. These sequences of clinical measurements are naturally represented as time series, characterized by multiple variables and large amounts of missing data, which complicate the analysis. In this work, we propose a novel kernel which is capable of exploiting both the information from the observed values as well the information hidden in the missing patterns in multivariate time series (MTS) originating e.g. from EHRs. The kernel, called TCK\(_{IM}\), is designed using an ensemble learning strategy in which the base models are novel mixed mode Bayesian mixture models which can effectively exploit informative missingness without having to resort to imputation methods. Moreover, the ensemble approach ensures robustness to hyperparameters and therefore TCK\(_{IM}\) is particularly well suited if there is a lack of labels—a known challenge in medical applications. Experiments on three real-world clinical datasets demonstrate the effectiveness of the proposed kernel.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agniel, D., et al.: Biases in electronic health record data due to processes within the healthcare system: retrospective observational study. BMJ 361, k1479 (2018)

    Google Scholar 

  2. Bagnall, A., et al.: The UEA multivariate time series classification archive 2018. arXiv preprintarXiv:1811.00075 (2018)

  3. Baydogan, M.: LPS Matlab implementation (2014). http://www.mustafabaydogan.com/. Accessed 06 Sept 2019

  4. Baydogan, M.G., Runger, G.: Time series representation and similarity based on local autopatterns. Data Min. Knowl. Disc. 30(2), 476–509 (2016)

    MathSciNet  MATH  Google Scholar 

  5. Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. In: 3rd International Conference on Knowledge Discovery and Data Mining, pp. 359–370. AAAI Press (1994)

    Google Scholar 

  6. Bianchi, F.M., et al.: Learning representations of multivariate time series with missing data. Patt. Rec. 96, 106973 (2019)

    Google Scholar 

  7. Branagan, G., Finnis, D.: Prognosis after anastomotic leakage in colorectal surgery. Dis. Colon Rectum 48(5), 1021–1026 (2005)

    Google Scholar 

  8. Che, Z., et al.: Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 8(1), 6085 (2018)

    Google Scholar 

  9. Cuturi, M., Fast global alignment kernel Matlab implementation (2011). http://www.marcocuturi.net/GA.html. Accessed 02 Sept 2019

  10. Cuturi, M.: Fast global alignment kernels. In: Proceedings of the 28th International Conference on Machine Learning, pp. 929–936 (2011)

    Google Scholar 

  11. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B (Methodol.) 39, 1–38 (1977)

    MathSciNet  MATH  Google Scholar 

  12. Dietterich, T.G.: Ensemble methods in machine learning. In: International Workshop on Multiple Classifier Systems, pp. 1–15 (2000)

    Google Scholar 

  13. Donders, A.R., et al.: Review: a gentle introduction to imputation of missing values. J. Clin. Epidemiol. 59(10), 1087–1091 (2006)

    Google Scholar 

  14. Halpern, Y., et al.: Electronic medical record phenotyping using the anchor and learn framework. J. Am. Med. Inform. Assoc. 23(4), 731–40 (2016)

    Google Scholar 

  15. Lewis, S.S., et al.: Assessing the relative burden of hospital-acquired infections in a network of community hospitals. Infect. Control Hosp. Epidemiol. 34(11), 1229–1230 (2013)

    Google Scholar 

  16. Li, Q., Xu, Y.: VS-GRU: a variable sensitive gated recurrent neural network for multivariate time series with massive missing values. Appl. Sci. 9(15), 3041 (2019)

    Google Scholar 

  17. Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, Hoboken (2014)

    MATH  Google Scholar 

  18. Ma, Z., Chen, G.: Bayesian methods for dealing with missing data problems. J. Korean. Stat. Soc. 47(3), 297–313 (2018)

    MathSciNet  MATH  Google Scholar 

  19. Magill, S.S., et al.: Prevalence of healthcare-associated infections in acute care hospitals in Jacksonville. Florida. Infect. Control 33(03), 283–291 (2012)

    Google Scholar 

  20. Mikalsen, K.Ø., et al.: Time series cluster kernel for learning similarities between multivariate time series with missing data. Pattern Recogn. 76, 569–581 (2018)

    Google Scholar 

  21. Olszewski, R.T.: Generalized feature extraction for structural pattern recognition in time-series data. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA, USA (2001)

    Google Scholar 

  22. Schafer, J.L., Graham, J.W.: Missing data: our view of the state of the art. Psychol. Methods 7(2), 147 (2002)

    Google Scholar 

  23. Shao, J., Zhong, B.: Last observation carry-forward and last observation analysis. Stat. Med. 22(15), 2429–2441 (2003)

    Google Scholar 

  24. Sharafoddini, A., et al.: A new insight into missing data in intensive care unit patient profiles: observational study. JMIR Med Inform. 7(1), e11605 (2019)

    Google Scholar 

  25. Shukla, S.N., Marlin, B.: Interpolation-prediction networks for irregularly sampled time series. In: ICLR (2019)

    Google Scholar 

  26. Silva, I., et al.: Predicting in-hospital mortality of ICU patients: the physionet/computing in cardiology challenge 2012. In: 2012 Computing in Cardiology, pp. 245–248. IEEE (2012)

    Google Scholar 

  27. Snijders, H., et al.: Anastomotic leakage as an outcome measure for quality of colorectal cancer surgery. BMJ Qual. Saf. 22(9), 759–767 (2013)

    Google Scholar 

  28. Soguero-Ruiz, C., et al.: Predicting colorectal surgical complications using heterogeneous clinical data and kernel methods. J. Biomed. Inform. 61, 87–96 (2016)

    Google Scholar 

  29. Zhang, Z.: Missing data imputation: focusing on single imputation. Ann. Transl. Med. 4(1), 9 (2016)

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank K. Hindberg for assistance on extraction of EHR data and the physicians A. Revhaug, R.-O. Lindsetmo and K. M. Augestad for helpful guidance throughout the study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karl Øyvind Mikalsen .

Editor information

Editors and Affiliations

Appendix—Synthetic benchmark datasets

Appendix—Synthetic benchmark datasets

To test how well TCK\(_{IM}\) performs for a varying degree of informative missingness, we generated in total 16 synthetic datasets by randomly injecting missing data into 4 MTS benchmark datasets. The characteristics of the datasets are described in Table 4. We transformed all MTS in each dataset to the same length, T, where T is given by \( T = \left\lceil T_{max} / \left\lceil T_{max}/25 \right\rceil \right\rceil . \) Here, \( \lceil \, \rceil \) is the ceiling operator and \(T_{max}\) is the length of the longest MTS in the original dataset.

Table 4 Characteristics of benchmark datasets. Attr is number of attributes, Train and Test number of training and test samples. \(N_c\) is the number of classes, \(T_{min}\) and \(T_{max}\) length of shortest and longest MTS, and T is the MTS length after the transformation

Datasets. The following procedure was used to create 8 synthetic datasets with missing data from the Wafer and Japanese vowels datasets. We randomly sampled a number \(c_v \in \{-1, 1\} \) for each attribute \(v \in \{1,\dots , V\} \), where \(c_v =1\) indicates that the attribute and the labels are positively correlated and \(c_v =-1\) negatively correlated. Thereafter, we sampled a missing rate \(\gamma _{nv}\) from \(\mathcal {U}[ 0.3 + E \cdot c_v \cdot (y^{(n)}-1), 0.7 + E \cdot c_v \cdot (y^{(n)}-1)]\) for each MTS \(X^{(n)}\) and attribute. The parameter E was tuned such that the Pearson correlation (absolute value) between the missing rates for the attributes \(\gamma _v\) and the labels \(y^{(n)}\) took the values \(\{0.2, \, 0.4, \, 0.6, \, 0.8\}\), respectively. By doing so, we could control the amount of informative missingness and because of the way we sampled \(\gamma _{nv}\), the missing rate in each dataset was around 50% independently of the Pearson correlation. Further, the following procedure was used to create 8 synthetic datasets from the uWave and Character trajectories datasets, which both consist of only 3 attributes. We randomly sampled a number \(c_v \in \{-1, 1\} \) for each attribute \(v \in \{1,\dots , V\} \). Attribute(s) with \(c_v = -1\) became negatively correlated with the labels by sampling \(\gamma _{nv}\) from \(\mathcal {U}[ 0.7 - E \cdot (y^{(n)}-1), 1 - E \cdot (y^{(n)}-1)]\), whereas the attribute(s) with \(c_v = 1\) became positively correlated with the labels by sampling \(\gamma _{nv}\) from \(\mathcal {U}[ 0.3 + E \cdot (y^{(n)}-1), 0.6 + E \cdot (y^{(n)}-1)]\). The parameter E was computed in the same way as above. Then, we computed the mean of each attribute \(\mu _v\) over the complete dataset and let each element with \( x^{(n)}_v(t) > \mu _v\) be missing with probability \(\gamma _{nv}\). This means that the probability of being missing is dependent on the value of the missing element, i.e. the missingness mechanism is MNAR within each class. Hence, this type of informative missingness is not the same as the one we created for the Wafer and Japanese vowels datasets.

Baselines. Three baseline models were created. The first baseline, namely ordinary TCK, ignores the missingness mechanism. In the second one, refered to as TCK\(_B\), we modeled the missing patterns naively by concatenating the binary missing indicator MTS R to the MTS X and creating a new MTS U with 2V attributes. Then, ordinary TCK was trained on the datasets consisting of \(\{U^{(n)}\}\). In the third baseline, TCK\(_0\), we investigated how well informative missingness can be captured by imputing zeros for the missing values and then training the TCK on the imputed data.

Table 5 Performance (accuracy) of TCK\(_{IM}\) and three baselines

Results. Table 5 shows the performance of the proposed TCK\(_{IM}\) and the three baselines for all of the 16 synthetic datasets. We see that the proposed TCK\(_{IM}\) achieves the best accuracy for 14 out of 16 datasets and is the only method which consistently has the expected behaviour, namely that the accuracy increases as the correlation between missing values and class labels increases. It can also be seen that the performance of TCK\(_{IM}\) is similar to TCK when the amount of information in the missing patterns is low, whereas TCK is clearly outperformed when the informative missingness is high. This demonstrates that TCK\(_{IM}\) can effectively exploit informative missingness.

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mikalsen, K.Ø., Soguero-Ruiz, C., Jenssen, R. (2021). A Kernel to Exploit Informative Missingness in Multivariate Time Series from EHRs. In: Shaban-Nejad, A., Michalowski, M., Buckeridge, D.L. (eds) Explainable AI in Healthcare and Medicine. Studies in Computational Intelligence, vol 914. Springer, Cham. https://doi.org/10.1007/978-3-030-53352-6_3

Download citation

Publish with us

Policies and ethics