Abstract
Word embeddings are widely used in several Natural Language Processing (NLP) applications. The training process typically involves iterative gradient updates of each word vector. This makes word frequency a major factor in the quality of embedding, and in general the embedding of words with few training occurrences end up being of poor quality. This is problematic since rare and frequent words, albeit semantically similar, might end up far from each other in the embedding space.
In this study, we develop KAFE (Knowledge And Frequency adapted Embeddings) which combines adversarial principles and knowledge graph to efficiently represent both frequent and rare words. The goal of adversarial training in KAFE is to minimize the spatial distinguishability (separability) of frequent and rare words in the embedding space. The knowledge graph encourages the embedding to follow the structure of the domain-specific hierarchy, providing an informative prior that is particularly important for words with low amount of training data. We demonstrate the performance of KAFE in representing clinical diagnoses using real-world Electronic Health Records (EHR) data coupled with a knowledge graph. EHRs are notorious for including ever-increasing numbers of rare concepts that are important to consider when defining the state of the patient for various downstream applications. Our experiments demonstrate better intelligibility through visualisation, as well as higher prediction and stability scores of KAFE over state-of-the-art.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Eisenstein, J.: Introduction to Natural Language Processing. MIT press, Cambridge (2019)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
Shickel, B., Tighe, P.J., Bihorac, A., Rashidi, P.: Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J. Biomed. Health Inform. 22(5), 1589–1604 (2017)
World Health Organization: International classification of diseases (ICD) information sheet (2018)
Xiao, C., Choi, E., Sun, J.: Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review. J. Am. Med. Inform. Assoc. 25(10), 1419–1428 (2018)
Gong, C., He, D., Tan, X., Qin, T., Wang, L., Liu, T.Y.: FRAGE: frequency-agnostic word representation. In: Advances in Neural Information Processing Systems, pp. 1334–1345 (2018)
Mu, J., Bhat, S., Viswanath, P.: All-but-the-top: simple and effective postprocessing for word representations. arXiv preprint arXiv:1702.01417 (2017)
Ashfaq, A.: Predicting clinical outcomes via machine learning on electronic health records. Ph.D. thesis, Halmstad University Press (2019)
Ashfaq, A., Nowaczyk, S.: Machine learning in healthcare-a system’s perspective. arXiv preprint arXiv:1909.07370 (2019)
Luong, M.T., Socher, R., Manning, C.D.: Better word representations with recursive neural networks for morphology. In: Proceedings of the Seventeenth Conference on Computational Natural Language Learning, pp. 104–113 (2013)
Creutz, M., Lagus, K.: Unsupervised models for morpheme segmentation and morphology learning. ACM Trans. Speech Lang. Process. (TSLP) 4(1), 1–34 (2007)
Ling, W., et al.: Finding function in form: compositional character models for open vocabulary word representation. arXiv preprint arXiv:1508.02096 (2015)
Kim, Y., Jernite, Y., Sontag, D., Rush, A.M.: Character-aware neural language models. arXiv preprint arXiv:1508.06615 (2015)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Xu, C., et al.: RC-NET: a general framework for incorporating knowledge into word representations. In: Proceedings of the 23rd ACM International Conference on Information and Knowledge Management, pp. 1219–1228 (2014)
Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Choi, E., Bahadori, M.T., Song, L., Stewart, W.F., Sun, J.: Gram: graph-based attention model for healthcare representation learning. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 787–795 (2017)
Ma, F., You, Q., Xiao, H., Chitta, R., Zhou, J., Gao, J.: KAME: knowledge-based attention model for diagnosis prediction in healthcare. In: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, pp. 743–752 (2018)
Song, L., Cheong, C.W., Yin, K., Cheung, W.K., Fung, B.C., Poon, J.: Medical concept embedding with multiple ontological representations. IJCAI 19, 4613–4619 (2019)
Ashfaq, A., et al.: Data resource profile: regional healthcare information platform in Halland, Sweden. Int. J. Epidemiol. 49(3), 738–739f (2020)
Lazaridou, A., Marelli, M., Baroni, M.: Multimodal word meaning induction from minimal exposure to natural text. Cogn. Sci. 41, 677–705 (2017)
Herbelot, A., Baroni, M.: High-risk learning: acquiring new word vectors from tiny data. arXiv preprint arXiv:1707.06556 (2017)
Schick, T., Schütze, H.: Attentive mimicking: Better word embeddings by attending to informative contexts. arXiv preprint arXiv:1904.01617 (2019)
Schick, T., Schütze, H.: Rare words: a major problem for contextualized embeddings and how to fix it by attentive mimicking. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 8766–8774 (2020)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Quan, H., et al.: Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med. Care 43, 1130–1139 (2005)
Bai, T., Zhang, S., Egleston, B.L., Vucetic, S.: Interpretable representation learning for healthcare via capturing disease progression through time. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 43–51 (2018)
Alshargi, F., Shekarpour, S., Soru, T., Sheth, A.P.: Metrics for evaluating quality of embeddings for ontological concepts (2018)
Miotto, R., Li, L., Kidd, B.A., Dudley, J.T.: Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Sci. Rep. 6(1), 1–10 (2016)
Choi, E., Xiao, C., Stewart, W.F., Sun, J.: MIME: multilevel medical embedding of electronic health records for predictive healthcare. arXiv preprint arXiv:1810.09593 (2018)
Ashfaq, A., Sant’Anna, A., Lingman, M., Nowaczyk, S.: Readmission prediction using deep learning on electronic health records. J. Biomed. Inform. 97, 103256 (2019)
Wendlandt, L., Kummerfeld, J.K., Mihalcea, R.: Factors influencing the surprising instability of word embeddings. arXiv preprint arXiv:1804.09692 (2018)
McInnes, L., Healy, J., Melville, J.: UMAP: uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426 (2018)
Obermeyer, Z., Powers, B., Vogeli, C., Mullainathan, S.: Dissecting racial bias in an algorithm used to manage the health of populations. Science 366(6464), 447–453 (2019)
Goldstein, B.A., Bhavsar, N.A., Phelan, M., Pencina, M.J.: Controlling for informed presence bias due to the number of health encounters in an electronic health record. Am. J. Epidemiol. 184(11), 847–855 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Ashfaq, A., Lingman, M., Nowaczyk, S. (2022). KAFE: Knowledge and Frequency Adapted Embeddings. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2021. Lecture Notes in Computer Science(), vol 13164. Springer, Cham. https://doi.org/10.1007/978-3-030-95470-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-95470-3_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95469-7
Online ISBN: 978-3-030-95470-3
eBook Packages: Computer ScienceComputer Science (R0)