Abstract
With the rapid growth of biomedical documents, finding out interested documents efficiently becomes a challenging for researchers. To improve the efficiency and display biomedical information in a more direct way, a medical knowledge graph-based semantic text analysis tool is developed. This tool is based on improved bag-of-words and ontology-based semantic text mining algorithms, supporting visualized biomedical conception and documents analysis. The testing results show that proposed models perform well on medical documents clustering, accuracy in different dataset is above 82% and the best one reaches 96%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Stanford CoreNLP 4.0.0: https://corenlp.run/. Last accessed 2020-04-16
NLPIR: https://github.com/NLPIR-team/NLPIR. Last accessed 2020-04-28
Word2Vec: https://radimrehurek.com/gensim/models/word2vec.html. Last accessed 2020-04-28
Zhang, Y., Jia, Y., Fu, L., Wang, X.: AceMap academic map and AceKG academic knowledge graph for academic data visualization. J. Shanghai Jiaotong Univ. (Sci.) 52(10), 1357–1362 (2018)
Singhal, A.: Introducing the knowledge graph: things, not strings. https://goo-gleblog.blogspot.com/2012/05/introduc-ing-knowledge-graph-things-not.html. Last accessed 2020-05-05
NCBI MESH: https://www.ncbi.nlm.nih.gov/mesh. Last accessed 2020-07-01
Yu, G.: Using meshes for MeSH term enrichment and semantic analyses. Bioinformatics 21(21) (2018)
Resnik, O.: Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity and natural language. J. Artif. Intell. Res. 19, 95–1130 (1999)
Lin, D.: Principle-based parsing without overgeneration. In: Proceedings of 31st Annual Meeting on Association for Computational Linguistics (ACL’93), Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 112–120 (1993)
Lord, P., Stevens, R., Brass, A., Goble, C.: Investigating semantic similarity measures across the gene ontology: the relationship between sequence and annotation. Bioinformatics 19(10), 1275–1283 (2003)
Jiang, J., Conrath, D.: Semantic similarity based on corpus statistics and lexical taxonomy. Proceedings of the International Conference on Research in Computational Linguistics (1997)
Luo, Z., Shi, M.-W., Yang, Z., Zhang, H.-Y., Chen, Z.: pyMeSHSim: an integrative python package to realize biomedical named entity recognition, normalization and comparison. https://doi.org/10.1101/459172 (2018)
Zhou, J., Shui, Y., Peng, S., Li, X., Mamitsuka, H., Zhu, S.: MeSHSim: an R/Bioconductor package for measuring semantic similarity over MeSH headings and MEDLINE documents. J. Bioinform. Comput. Biol. 13(06), 1542002 (2015)
Leacock, C., Chodorow, M.: Filling in a sparse training space for word sense identification. ms (1994) Â
Wu, Y., Zhao, S., Li, C., et al.: Text classification method based on TF-IDF and cosine similarity. J. Chin. Inf. Process. 31(05), 138–145 (2017)
PubMed: https://pubmed.ncbi.nlm.nih.gov/. Last accessed 2020-03-01
Python.Scrapy 2.1 documentation. https://scrapy.org. Last accessed 2020-04-28
UMLS: https://umls.nlm.nih.gov/. Last accessed 2020-04-28
MetaMap Document: https://metamap.nlm.nih.gov/Docs/. Last accessed 2020-04-28
MacKay, D.: An example inference task: clustering. In: Information Theory, Inference and Learning Algorithms, pp. 284–292, Cambridge University Press, Cambridge (2003)
Zare, H., Shooshtari, P., Gupta, A., Brinkman, R.: Data reduction for spectral clustering to analyze high throughput flow cytometry data. BMC Bioinformatics 11(1) (2010)
Spring Boot: https://spring.io/projects/spring-boot. Last accessed 2020-04-28
Hersh, W., Cohen, A., Yang, J., Bhupatiraju, R.T., Roberts, P., Hearst, M.: Trec 2005 genomics track overview. In: TREC 2005 Notebook, pp. 14–25 (2005)
Evaluation of clustering. https://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html. Last accessed 2020-04-28
Acknowledgements
This research was supported by the National Natural Science Foundation of China (Grant No. 61702324 and Grant No. 61911540482) in People’s Republic of China, and by the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT and Future Planning (NRF-2019K2A9A2A06020672).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, M., Hu, J., Ryu, K.H. (2021). An Efficient Tool for Semantic Biomedical Document Analysis. In: Pan, JS., Li, J., Ryu, K.H., Meng, Z., Klasnja-Milicevic, A. (eds) Advances in Intelligent Information Hiding and Multimedia Signal Processing. Smart Innovation, Systems and Technologies, vol 212. Springer, Singapore. https://doi.org/10.1007/978-981-33-6757-9_63
Download citation
DOI: https://doi.org/10.1007/978-981-33-6757-9_63
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-6756-2
Online ISBN: 978-981-33-6757-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)