A Kernel to Exploit Informative Missingness in Multivariate Time Series from EHRs

Mikalsen, Karl Øyvind; Soguero-Ruiz, Cristina; Jenssen, Robert

doi:10.1007/978-3-030-53352-6_3

Karl Øyvind Mikalsen^5,6,
Cristina Soguero-Ruiz^6,7 &
Robert Jenssen⁶

Part of the book series: Studies in Computational Intelligence ((SCI,volume 914))

1678 Accesses
1 Citations

Abstract

A large fraction of the electronic health records (EHRs) consists of clinical measurements collected over time, such as lab tests and vital signs, which provide important information about a patient’s health status. These sequences of clinical measurements are naturally represented as time series, characterized by multiple variables and large amounts of missing data, which complicate the analysis. In this work, we propose a novel kernel which is capable of exploiting both the information from the observed values as well the information hidden in the missing patterns in multivariate time series (MTS) originating e.g. from EHRs. The kernel, called TCK\(_{IM}\), is designed using an ensemble learning strategy in which the base models are novel mixed mode Bayesian mixture models which can effectively exploit informative missingness without having to resort to imputation methods. Moreover, the ensemble approach ensures robustness to hyperparameters and therefore TCK\(_{IM}\) is particularly well suited if there is a lack of labels—a known challenge in medical applications. Experiments on three real-world clinical datasets demonstrate the effectiveness of the proposed kernel.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agniel, D., et al.: Biases in electronic health record data due to processes within the healthcare system: retrospective observational study. BMJ 361, k1479 (2018)
Google Scholar
Bagnall, A., et al.: The UEA multivariate time series classification archive 2018. arXiv preprintarXiv:1811.00075 (2018)
Baydogan, M.: LPS Matlab implementation (2014). http://www.mustafabaydogan.com/. Accessed 06 Sept 2019
Baydogan, M.G., Runger, G.: Time series representation and similarity based on local autopatterns. Data Min. Knowl. Disc. 30(2), 476–509 (2016)
MathSciNet MATH Google Scholar
Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. In: 3rd International Conference on Knowledge Discovery and Data Mining, pp. 359–370. AAAI Press (1994)
Google Scholar
Bianchi, F.M., et al.: Learning representations of multivariate time series with missing data. Patt. Rec. 96, 106973 (2019)
Google Scholar
Branagan, G., Finnis, D.: Prognosis after anastomotic leakage in colorectal surgery. Dis. Colon Rectum 48(5), 1021–1026 (2005)
Google Scholar
Che, Z., et al.: Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 8(1), 6085 (2018)
Google Scholar
Cuturi, M., Fast global alignment kernel Matlab implementation (2011). http://www.marcocuturi.net/GA.html. Accessed 02 Sept 2019
Cuturi, M.: Fast global alignment kernels. In: Proceedings of the 28th International Conference on Machine Learning, pp. 929–936 (2011)
Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc. Ser. B (Methodol.) 39, 1–38 (1977)
MathSciNet MATH Google Scholar
Dietterich, T.G.: Ensemble methods in machine learning. In: International Workshop on Multiple Classifier Systems, pp. 1–15 (2000)
Google Scholar
Donders, A.R., et al.: Review: a gentle introduction to imputation of missing values. J. Clin. Epidemiol. 59(10), 1087–1091 (2006)
Google Scholar
Halpern, Y., et al.: Electronic medical record phenotyping using the anchor and learn framework. J. Am. Med. Inform. Assoc. 23(4), 731–40 (2016)
Google Scholar
Lewis, S.S., et al.: Assessing the relative burden of hospital-acquired infections in a network of community hospitals. Infect. Control Hosp. Epidemiol. 34(11), 1229–1230 (2013)
Google Scholar
Li, Q., Xu, Y.: VS-GRU: a variable sensitive gated recurrent neural network for multivariate time series with massive missing values. Appl. Sci. 9(15), 3041 (2019)
Google Scholar
Little, R.J., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, Hoboken (2014)
MATH Google Scholar
Ma, Z., Chen, G.: Bayesian methods for dealing with missing data problems. J. Korean. Stat. Soc. 47(3), 297–313 (2018)
MathSciNet MATH Google Scholar
Magill, S.S., et al.: Prevalence of healthcare-associated infections in acute care hospitals in Jacksonville. Florida. Infect. Control 33(03), 283–291 (2012)
Google Scholar
Mikalsen, K.Ø., et al.: Time series cluster kernel for learning similarities between multivariate time series with missing data. Pattern Recogn. 76, 569–581 (2018)
Google Scholar
Olszewski, R.T.: Generalized feature extraction for structural pattern recognition in time-series data. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA, USA (2001)
Google Scholar
Schafer, J.L., Graham, J.W.: Missing data: our view of the state of the art. Psychol. Methods 7(2), 147 (2002)
Google Scholar
Shao, J., Zhong, B.: Last observation carry-forward and last observation analysis. Stat. Med. 22(15), 2429–2441 (2003)
Google Scholar
Sharafoddini, A., et al.: A new insight into missing data in intensive care unit patient profiles: observational study. JMIR Med Inform. 7(1), e11605 (2019)
Google Scholar
Shukla, S.N., Marlin, B.: Interpolation-prediction networks for irregularly sampled time series. In: ICLR (2019)
Google Scholar
Silva, I., et al.: Predicting in-hospital mortality of ICU patients: the physionet/computing in cardiology challenge 2012. In: 2012 Computing in Cardiology, pp. 245–248. IEEE (2012)
Google Scholar
Snijders, H., et al.: Anastomotic leakage as an outcome measure for quality of colorectal cancer surgery. BMJ Qual. Saf. 22(9), 759–767 (2013)
Google Scholar
Soguero-Ruiz, C., et al.: Predicting colorectal surgical complications using heterogeneous clinical data and kernel methods. J. Biomed. Inform. 61, 87–96 (2016)
Google Scholar
Zhang, Z.: Missing data imputation: focusing on single imputation. Ann. Transl. Med. 4(1), 9 (2016)
Google Scholar

Download references

Acknowledgements

The authors would like to thank K. Hindberg for assistance on extraction of EHR data and the physicians A. Revhaug, R.-O. Lindsetmo and K. M. Augestad for helpful guidance throughout the study.

Author information

Authors and Affiliations

University Hospital of North-Norway, Tromsø, Norway
Karl Øyvind Mikalsen
Department of Physics and Technology, UiT The Arctic University of Norway, 9037, Tromsø, Norway
Karl Øyvind Mikalsen, Cristina Soguero-Ruiz & Robert Jenssen
Rey Juan Carlos University, Móstoles, Spain
Cristina Soguero-Ruiz

Authors

Karl Øyvind Mikalsen
View author publications
You can also search for this author in PubMed Google Scholar
Cristina Soguero-Ruiz
View author publications
You can also search for this author in PubMed Google Scholar
Robert Jenssen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Karl Øyvind Mikalsen .

Editor information

Editors and Affiliations

Department of Pediatrics, College of Medicine, The University of Tennessee Health Science Center (UTHSC), Oak-Ridge National Lab (ORNL), Memphis, TN, USA
Arash Shaban-Nejad
School of Nursing, University of Minnesota, Minneapolis, MN, USA
Martin Michalowski
McGill Clinical & Health Informatics, Montreal, QC, Canada
David L. Buckeridge

Appendix—Synthetic benchmark datasets

To test how well TCK\(_{IM}\) performs for a varying degree of informative missingness, we generated in total 16 synthetic datasets by randomly injecting missing data into 4 MTS benchmark datasets. The characteristics of the datasets are described in Table 4. We transformed all MTS in each dataset to the same length, T, where T is given by \( T = \left\lceil T_{max} / \left\lceil T_{max}/25 \right\rceil \right\rceil . \) Here, \( \lceil \, \rceil \) is the ceiling operator and \(T_{max}\) is the length of the longest MTS in the original dataset.

Table 4 Characteristics of benchmark datasets. Attr is number of attributes, Train and Test number of training and test samples. \(N_c\) is the number of classes, \(T_{min}\) and \(T_{max}\) length of shortest and longest MTS, and T is the MTS length after the transformation

Full size table

Datasets. The following procedure was used to create 8 synthetic datasets with missing data from the Wafer and Japanese vowels datasets. We randomly sampled a number \(c_v \in \{-1, 1\} \) for each attribute \(v \in \{1,\dots , V\} \), where \(c_v =1\) indicates that the attribute and the labels are positively correlated and \(c_v =-1\) negatively correlated. Thereafter, we sampled a missing rate \(\gamma _{nv}\) from \(\mathcal {U}[ 0.3 + E \cdot c_v \cdot (y^{(n)}-1), 0.7 + E \cdot c_v \cdot (y^{(n)}-1)]\) for each MTS \(X^{(n)}\) and attribute. The parameter E was tuned such that the Pearson correlation (absolute value) between the missing rates for the attributes \(\gamma _v\) and the labels \(y^{(n)}\) took the values \(\{0.2, \, 0.4, \, 0.6, \, 0.8\}\), respectively. By doing so, we could control the amount of informative missingness and because of the way we sampled \(\gamma _{nv}\), the missing rate in each dataset was around 50% independently of the Pearson correlation. Further, the following procedure was used to create 8 synthetic datasets from the uWave and Character trajectories datasets, which both consist of only 3 attributes. We randomly sampled a number \(c_v \in \{-1, 1\} \) for each attribute \(v \in \{1,\dots , V\} \). Attribute(s) with \(c_v = -1\) became negatively correlated with the labels by sampling \(\gamma _{nv}\) from \(\mathcal {U}[ 0.7 - E \cdot (y^{(n)}-1), 1 - E \cdot (y^{(n)}-1)]\), whereas the attribute(s) with \(c_v = 1\) became positively correlated with the labels by sampling \(\gamma _{nv}\) from \(\mathcal {U}[ 0.3 + E \cdot (y^{(n)}-1), 0.6 + E \cdot (y^{(n)}-1)]\). The parameter E was computed in the same way as above. Then, we computed the mean of each attribute \(\mu _v\) over the complete dataset and let each element with \( x^{(n)}_v(t) > \mu _v\) be missing with probability \(\gamma _{nv}\). This means that the probability of being missing is dependent on the value of the missing element, i.e. the missingness mechanism is MNAR within each class. Hence, this type of informative missingness is not the same as the one we created for the Wafer and Japanese vowels datasets.

Baselines. Three baseline models were created. The first baseline, namely ordinary TCK, ignores the missingness mechanism. In the second one, refered to as TCK\(_B\), we modeled the missing patterns naively by concatenating the binary missing indicator MTS R to the MTS X and creating a new MTS U with 2V attributes. Then, ordinary TCK was trained on the datasets consisting of \(\{U^{(n)}\}\). In the third baseline, TCK\(_0\), we investigated how well informative missingness can be captured by imputing zeros for the missing values and then training the TCK on the imputed data.

Table 5 Performance (accuracy) of TCK\(_{IM}\) and three baselines

Full size table

Results. Table 5 shows the performance of the proposed TCK\(_{IM}\) and the three baselines for all of the 16 synthetic datasets. We see that the proposed TCK\(_{IM}\) achieves the best accuracy for 14 out of 16 datasets and is the only method which consistently has the expected behaviour, namely that the accuracy increases as the correlation between missing values and class labels increases. It can also be seen that the performance of TCK\(_{IM}\) is similar to TCK when the amount of information in the missing patterns is low, whereas TCK is clearly outperformed when the informative missingness is high. This demonstrates that TCK\(_{IM}\) can effectively exploit informative missingness.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mikalsen, K.Ø., Soguero-Ruiz, C., Jenssen, R. (2021). A Kernel to Exploit Informative Missingness in Multivariate Time Series from EHRs. In: Shaban-Nejad, A., Michalowski, M., Buckeridge, D.L. (eds) Explainable AI in Healthcare and Medicine. Studies in Computational Intelligence, vol 914. Springer, Cham. https://doi.org/10.1007/978-3-030-53352-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-53352-6_3
Published: 03 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-53351-9
Online ISBN: 978-3-030-53352-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

A Kernel to Exploit Informative Missingness in Multivariate Time Series from EHRs

Abstract

Access this chapter

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix—Synthetic benchmark datasets

Appendix—Synthetic benchmark datasets

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation