Abstract
Sequence alignment methods have the promise to reserve important temporal information in electronic health records (EHRs) for comparing patient medical records. Compared to global sequence alignment, local sequence alignment is more useful when comparing patient medical records. One commonly used local sequence alignment algorithm is Smith-Waterman algorithm (SWA), which is widely used for aligning biological sequence. However directly applying this algorithm to align patient medical records will obtain suboptimal performance since it fails to consider complex situations in EHRs such as the temporality of medical events. In this work, we propose a new algorithm called Knowledge-Enriched Local Sequence Alignment algorithm (KELSA), which incorporates meaningful medical knowledge during sequence alignments. We evaluate our algorithm by comparing it to SWA on synthetic EHR data where the reference alignments are known. Our results show that KELSA aligns better than SWA by inserting new daily events and identifying more similarities between patient medical records. Compared to SWA, KELSA is more suitable for locally comparing patient medical records.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Altschul, S.F., Gish, W., Miller, W., et al.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)
Bello, H.K., Gbolagade, K.A.: Residue number system: an important application in bioinformatics. Int. J. Comput. Applicat. 975, 8887
Brown, S.A.: Patient similarity: emerging concepts in systems and precision medicine. Front. Physiol. 7, 561 (2016)
Buchfink, B., Xie, C., Huson, D.H.: Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59 (2015)
Che, C., Xiao, C., Liang, J., et al.: An RNN architecture with dynamic temporal matching for personalized predictions of Parkinson's disease. In: Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 198–206. SIAM (2017)
Giannoula, A., Gutierrez-Sacristán, A., Bravo, Á., et al.: Identifying temporal patterns in patient disease trajectories using dynamic time warping: A population-based study. Sci. Rep. 8, 4216 (2018)
Huang, M., Eltayeby, O., Zolnoori, M., et al.: Public opinions toward diseases: infodemiological study on news media data. J. Med. Internet Res. 20, e10047 (2018)
Huang, M., Shah, N.D., Yao, L.: Evaluating global and local sequence alignment methods for comparing patient medical records. BMC Med. Inform. Decis. Mak. 19, 263 (2019)
Huang, M., Zolnoori, M., Balls-Berry, J.E., et al.: Technological innovations in disease management text mining US patent data from 1995 to 2017. J. Med. Internet Res. 21, e13316 (2019)
Huang, M., Zolnoori, M., Shah, N., et al.: Temporal sequence alignment in electronic health records for computable patient representation. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1054–1061. IEEE (2018)
Lipman, D.J., Pearson, W.R.: Rapid and sensitive protein similarity searches. Science 227, 1435–1441 (1985)
Müller, M.: Dynamic time warping. In: Information Retrieval for Music and Motion, pp. 69-84 (2007)
National Center for Health Statistics International classification of diseases, ninth revision, clinical modification (ICD-9-CM). Centers for Disease Control Prevention, Atlanta, Georgia, USA (2013)
Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
Rocca, W.A., Grossardt, B.R., Brue, S.M., et al.: Data resource profile: expansion of the Rochester epidemiology project medical records-linkage system (E-REP). Int. J. Epidemiol. 47, 368–368j (2018)
Shickel, B., Tighe, P.J., Bihorac, A., et al.: Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J. Biomed. Health Inform. 22, 1589–1604 (2018)
Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)
St Sauver, J.L., Grossardt, B.R., Yawn, B.P., et al.: Data resource profile: the Rochester Epidemiology Project (REP) medical records-linkage system. Int. J. Epidemiol. 41, 1614–1624 (2012)
St Sauver, J.L., Grossardt, B.R., Yawn, B.P., et al.: Use of a medical records linkage system to enumerate a dynamic population over time: the Rochester epidemiology project. Am. J. Epidemiol. 173, 1059–1068 (2011)
Sun, J., Chen, K., Hao, Z.: Pairwise alignment for very long nucleic acid sequences. Biochem. Biophys. Res. Commun. 502, 313–317 (2018)
Wei, W.Q., Bastarache, L.A., Carroll, R.J., et al.: Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record. PLoS ONE 12, e0175508 (2017)
Zong, N., Kim, H., Ngo, V., et al.: Deep mining heterogeneous networks of biomedical linked data to predict novel drug–target associations. Bioinformatics 33, 2337–2344 (2017)
Acknowledgements
Funding for this study was provided by NLM (5K01LM012102) and the Center for Clinical and Translational Science (UL1TR002377) from the NIH/NCATS.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Huang, M., Shah, N.D., Yao, L. (2021). KELSA: A Knowledge-Enriched Local Sequence Alignment Algorithm for Comparing Patient Medical Records. In: Shaban-Nejad, A., Michalowski, M., Buckeridge, D.L. (eds) Explainable AI in Healthcare and Medicine. Studies in Computational Intelligence, vol 914. Springer, Cham. https://doi.org/10.1007/978-3-030-53352-6_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-53352-6_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-53351-9
Online ISBN: 978-3-030-53352-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)