Skip to main content

KELSA: A Knowledge-Enriched Local Sequence Alignment Algorithm for Comparing Patient Medical Records

  • Chapter
  • First Online:
Explainable AI in Healthcare and Medicine

Part of the book series: Studies in Computational Intelligence ((SCI,volume 914))

  • 1689 Accesses

Abstract

Sequence alignment methods have the promise to reserve important temporal information in electronic health records (EHRs) for comparing patient medical records. Compared to global sequence alignment, local sequence alignment is more useful when comparing patient medical records. One commonly used local sequence alignment algorithm is Smith-Waterman algorithm (SWA), which is widely used for aligning biological sequence. However directly applying this algorithm to align patient medical records will obtain suboptimal performance since it fails to consider complex situations in EHRs such as the temporality of medical events. In this work, we propose a new algorithm called Knowledge-Enriched Local Sequence Alignment algorithm (KELSA), which incorporates meaningful medical knowledge during sequence alignments. We evaluate our algorithm by comparing it to SWA on synthetic EHR data where the reference alignments are known. Our results show that KELSA aligns better than SWA by inserting new daily events and identifying more similarities between patient medical records. Compared to SWA, KELSA is more suitable for locally comparing patient medical records.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Altschul, S.F., Gish, W., Miller, W., et al.: Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990)

    Article  Google Scholar 

  2. Bello, H.K., Gbolagade, K.A.: Residue number system: an important application in bioinformatics. Int. J. Comput. Applicat. 975, 8887

    Google Scholar 

  3. Brown, S.A.: Patient similarity: emerging concepts in systems and precision medicine. Front. Physiol. 7, 561 (2016)

    Article  Google Scholar 

  4. Buchfink, B., Xie, C., Huson, D.H.: Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59 (2015)

    Article  Google Scholar 

  5. Che, C., Xiao, C., Liang, J., et al.: An RNN architecture with dynamic temporal matching for personalized predictions of Parkinson's disease. In: Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 198–206. SIAM (2017)

    Google Scholar 

  6. Giannoula, A., Gutierrez-Sacristán, A., Bravo, Á., et al.: Identifying temporal patterns in patient disease trajectories using dynamic time warping: A population-based study. Sci. Rep. 8, 4216 (2018)

    Article  Google Scholar 

  7. Huang, M., Eltayeby, O., Zolnoori, M., et al.: Public opinions toward diseases: infodemiological study on news media data. J. Med. Internet Res. 20, e10047 (2018)

    Article  Google Scholar 

  8. Huang, M., Shah, N.D., Yao, L.: Evaluating global and local sequence alignment methods for comparing patient medical records. BMC Med. Inform. Decis. Mak. 19, 263 (2019)

    Article  Google Scholar 

  9. Huang, M., Zolnoori, M., Balls-Berry, J.E., et al.: Technological innovations in disease management text mining US patent data from 1995 to 2017. J. Med. Internet Res. 21, e13316 (2019)

    Article  Google Scholar 

  10. Huang, M., Zolnoori, M., Shah, N., et al.: Temporal sequence alignment in electronic health records for computable patient representation. In: 2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 1054–1061. IEEE (2018)

    Google Scholar 

  11. Lipman, D.J., Pearson, W.R.: Rapid and sensitive protein similarity searches. Science 227, 1435–1441 (1985)

    Article  Google Scholar 

  12. Müller, M.: Dynamic time warping. In: Information Retrieval for Music and Motion, pp. 69-84 (2007)

    Google Scholar 

  13. National Center for Health Statistics International classification of diseases, ninth revision, clinical modification (ICD-9-CM). Centers for Disease Control Prevention, Atlanta, Georgia, USA (2013)

    Google Scholar 

  14. Needleman, S.B., Wunsch, C.D.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)

    Article  Google Scholar 

  15. Rocca, W.A., Grossardt, B.R., Brue, S.M., et al.: Data resource profile: expansion of the Rochester epidemiology project medical records-linkage system (E-REP). Int. J. Epidemiol. 47, 368–368j (2018)

    Article  Google Scholar 

  16. Shickel, B., Tighe, P.J., Bihorac, A., et al.: Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J. Biomed. Health Inform. 22, 1589–1604 (2018)

    Article  Google Scholar 

  17. Smith, T.F., Waterman, M.S.: Identification of common molecular subsequences. J. Mol. Biol. 147, 195–197 (1981)

    Article  Google Scholar 

  18. St Sauver, J.L., Grossardt, B.R., Yawn, B.P., et al.: Data resource profile: the Rochester Epidemiology Project (REP) medical records-linkage system. Int. J. Epidemiol. 41, 1614–1624 (2012)

    Article  Google Scholar 

  19. St Sauver, J.L., Grossardt, B.R., Yawn, B.P., et al.: Use of a medical records linkage system to enumerate a dynamic population over time: the Rochester epidemiology project. Am. J. Epidemiol. 173, 1059–1068 (2011)

    Article  Google Scholar 

  20. Sun, J., Chen, K., Hao, Z.: Pairwise alignment for very long nucleic acid sequences. Biochem. Biophys. Res. Commun. 502, 313–317 (2018)

    Article  Google Scholar 

  21. Wei, W.Q., Bastarache, L.A., Carroll, R.J., et al.: Evaluating phecodes, clinical classification software, and ICD-9-CM codes for phenome-wide association studies in the electronic health record. PLoS ONE 12, e0175508 (2017)

    Article  Google Scholar 

  22. Zong, N., Kim, H., Ngo, V., et al.: Deep mining heterogeneous networks of biomedical linked data to predict novel drug–target associations. Bioinformatics 33, 2337–2344 (2017)

    Article  Google Scholar 

Download references

Acknowledgements

Funding for this study was provided by NLM (5K01LM012102) and the Center for Clinical and Translational Science (UL1TR002377) from the NIH/NCATS.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lixia Yao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Huang, M., Shah, N.D., Yao, L. (2021). KELSA: A Knowledge-Enriched Local Sequence Alignment Algorithm for Comparing Patient Medical Records. In: Shaban-Nejad, A., Michalowski, M., Buckeridge, D.L. (eds) Explainable AI in Healthcare and Medicine. Studies in Computational Intelligence, vol 914. Springer, Cham. https://doi.org/10.1007/978-3-030-53352-6_21

Download citation

Publish with us

Policies and ethics