Frontier knowledge discovery and visualization in cancer field based on KOS and LDA

  • Qingqiang Wu
  • Yichen Kuang
  • Qingqi Hong
  • Yingying SheEmail author


Scientific research journals have achieved the latest development in scientific research in various fields. However, the interpretation and use of biomedical information is still a very complicated issue. How to use practical methods to interpret biomedical literature into structured data and analyze it into what we can understand has become a major issue. In this paper, a frontier knowledge discovery model based on KOS and LDA is proposed and applied in detecting burst topic and its sematic information relationship in cancer field. Experiments showed that the model plays an important role in topic recognition, evolution recognition and visualization. Furthermore, the application of KOS combined with LDA can effectively remove noisy concept in sematic layer and show a good effect.


Knowledge organization system (KOS) Latent Dirichlet allocation (LDA) Frontier knowledge Topic Evolution 



The project is supported by the National Natural Science Foundation of China (Grant No. 61502402), the Fundamental Research Funds for the Central Universities (Grant No. 20720180073), the state key laboratory of virtual reality technology and systems of China (Grant No. BUAA-VR-15 KF-09) and the Xiamen University (Grant No. 20720150081).


  1. AlSumait, L., Barbara, D., & Domeniconi, C. (2008). On-line LDA: Adaptive topic models for mining text streams with applications to topic detection and tracking. In D. Gunopulos, F. Turini, C. Zaniolo, N. Ramakrishnan, & X. D. Wu (Eds.), ICDM 2008: Eighth IEEE international conference on data mining, proceedings (pp. 3–12, IEEE international conference on data mining).Google Scholar
  2. Aronson, A. R., & Lang, F. M. (2010). An overview of MetaMap: historical perspective and recent advances. Journal of the American Medical Informatics Association, 17(3), 229–236. Scholar
  3. Asuncion, A., Welling, M., Smyth, P., & Teh, Y. W. (2012). On smoothing and inference for topic models. UAI 2009, abs/1205.2662, 27-34.
  4. Bishop, C. M. (2006). Pattern recognition and machine learning. New York: Springer.zbMATHGoogle Scholar
  5. Bleeker, F. E., Molenaar, R. J., & Sieger, L. (2012). Recent advances in the molecular understanding of glioblastoma. Journal of Neuro-oncology, 108(1), 11.Google Scholar
  6. Blei, D. M., & Lafferty, J. D. (2005). Correlated topic models. In International conference on neural information processing systems, 2005 (pp. 147–154).Google Scholar
  7. Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. In International conference, 2006 (pp. 113–120).Google Scholar
  8. Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(4–5), 993–1022. Scholar
  9. Bodenreider, O. (2004). The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Research, 32(Database issue), D267–D270. Scholar
  10. Buckland, M., Chen, A., Chen, H. M., Kim, Y., Lam, B., Larson, R., et al. (1999). Mapping entry vocabulary to unfamiliar metadata vocabularies. D-Lib Magazine.
  11. Cao, L., & Zheng, C. (2010). An Improved Algorithm for Semantic Similarity Based on HowNet. Electronic Technology, 47, 1–3.Google Scholar
  12. Cao, J., Xia, T., Li, J., Zhang, Y., & Tang, S. (2009). A density-based method for adaptive LDA model selection. Neurocomputing, 72(7), 1775–1781.Google Scholar
  13. Chen, L. (2010). The analysis of research frontier and hot topics about knowledge discovery (KD) based on mapping knowledge domain. In Wase international conference on information engineering, 2010 (pp. 28–32).Google Scholar
  14. Chen, Y. H., Lin, Y. J., & Zuo, W. L. (2017). Phrase-based topic and sentiment detection and tracking model using incremental HDP. KSII Transactions on Internet and Information Systems, 11(12), 5905–5926. Scholar
  15. Chen, Y. N., Liu, L. Z., & IEEE. (2016). Development and research of topic detection and tracking. In Proceedings of 2016 IEEE 7th international conference on software engineering and service science. International conference on software engineering and service science (pp. 170–173). New York: IEEE.Google Scholar
  16. Collaborators, G. D. (2017). Global, regional, and national disability-adjusted life-years (DALYs) for 333 diseases and injuries and healthy life expectancy (HALE) for 195 countries and territories, 1990–2016: A systematic analysis for the Global Burden of Disease Study 2016. Lancet, 390(10100), 1260.Google Scholar
  17. Dancey, J. E., Dodd, L. E., Ford, R., Kaplan, R., Mooney, M., Rubinstein, L., et al. (2009). Recommendations for the assessment of progression in randomised cancer treatment trials. European Journal of Cancer, 45(2), 281–289. Scholar
  18. Daura-Oller, E., Cabre, M., Montero, M. A., Paternain, J. L., & Romeu, A. (2009). Specific gene hypomethylation and cancer: New insights into coding region feature trends. Bioinformation, 3(8), 340–343.Google Scholar
  19. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391–407.;2-9.Google Scholar
  20. Ding, W. Y., Zhang, Y., Chen, C. M., & Hu, X. H. (2016). Semi-supervised DirichletHawkes process with applications of topic detection and tracking in twitter (2016 IEEE international conference on big data). New York: IEEE.Google Scholar
  21. Fan, S. P., Xin-Ying, A. N., & Zhao, Y. G. (2016). Framework for multidimensional feature recognition-based studies on frontier knowledge discovery in medical field. Chinese Journal of Medical Library and Information Science, 25, 1–7.Google Scholar
  22. Griffiths, T. (2007). Gibbs sampling in the generative model of latent Dirichlet allocation. Standford: Standford University.Google Scholar
  23. Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101, 5228–5235. Scholar
  24. Haixia, S., Qing, Q., Yingjie, W., & Lian, L. J. (2010). Research on semantic similarity measuring of MeSH. New Technology of Library and Information Service, 26(6), 12–16.Google Scholar
  25. Hofmann, T. (1999). Probabilistic latent semantic indexing (Sigir’99: Proceedings of 22nd international conference on research and development in information retrieval). Google Scholar
  26. Hong, Y., Zhang, Y., Liu, T., & Li, S. (2007). Evaluation and research of topic detection and tracking. Journal of Chinese Information Processing, 21(6), 71–87.Google Scholar
  27. Hu, Z. Y., Fang, S., & Liang, T. (2014). Empirical study of constructing a knowledge organization system of patent documents using topic modeling. Scientometrics, 100(3), 787–799. Scholar
  28. Humphreys, B. L. (1988). Unified medical language system: Progress report. International Classification, 15, 85–86.Google Scholar
  29. Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22(1), 79–86. Scholar
  30. Lei, G. (2017). Visualization of topic discovery and evolution based on LDA. Modern Computer, 7, 42–44.Google Scholar
  31. Li, H. J., Cheng, P., & Xie, H. Y. (2017). Text Visualization and LDA Model Based on R Language. In L. Zhu, & T. Zheng (Eds.), Proceedings Of the 2017 2nd International Conference on Machinery, Electronics And Control Simulation (Vol. 138, pp. 516-519, AER-Advances in Engineering Research). Paris: Atlantis Press.Google Scholar
  32. Li, G., Jiang, S., Zhang, W., Pang, J., & Huang, Q. (2016). Online web video topic detection and tracking with semi-supervised learning. Multimedia Systems, 22(1), 115–125.Google Scholar
  33. Lindberg, D. A. H., & Humphreys, B. L. (1987). Toward a unified medical language. In European federation for medical informatics, Rome, Italy, 1987 September 2125, 1987 (pp. 23–31).Google Scholar
  34. Lipscomb, C. E. (2000). Medical subject headings (MeSH). Bulletin of the Medical Library Association, 88(3), 265–266.Google Scholar
  35. Mayr, P., Tudhope, D., Clarke, S. D., Zeng, M. L., & Lin, X. (2016). Recent applications of Knowledge Organization Systems: introduction to a special issue. International Journal of Digital Library Systems, 17(1), 1–4. Google Scholar
  36. Meng, L., Huang, R., & Gu, J. (2013). A review of semantic similarity measures in WordNet. International Journal of Hybrid Information Technology, 6, 1–12.Google Scholar
  37. Pedersen, T., Patwardhan, S., & Michelizzi, J. WordNet: Similarity—measuring the relatedness of concepts. In National conference on artificial intelligence, 2004 (pp. 1024–1025).Google Scholar
  38. Rau, P. L. P. (2005). Book review: The craft of information visualization: Readings and reflections by B. B. Bederson and B. Shneiderman. International Journal of Human–Computer Interaction, 18(1), 129–130.Google Scholar
  39. Rindflesch, T. C., & Fiszman, M. (2003). The interaction of domain knowledge and linguistic structure in natural language processing: Interpreting hypernymic propositions in biomedical text. Journal of Biomedical Informatics, 36(6), 462–477. Scholar
  40. Scibor, E., & Tomasikbeck, J. (1994). On the establishment of concordances between indexing languages of universal or interdisciplinary scope (Polish experiences). Knowledge Organization, 21(4), 203–212.Google Scholar
  41. Shaoping, F., Xinying, A., & Wanhui, L. (2017). The study on method for topic semantic similarity based on medical literature. Library and Information Service, 8, 96–105.Google Scholar
  42. Wake, S., & Nicholson, D. (2001). HILT: High-level thesaurus project. Building consensus for interoperable subject access across communities. D-Lib Magazine. Scholar
  43. Wang, C., Blei, D., & Heckerman, D. (2012). Continuous time dynamic topic models. Uaiabs/1206.3298, 579–586.
  44. Wang, M., Jayaraman, P. P., Solaiman, E., Chen, L. Y., Li, Z., Jun, S., et al. (2018). A multi-layered performance analysis for cloud-based topic detection and tracking in Big Data applications. Future Generation Computer Systems-the International Journal of Escience, 87, 580–590. Scholar
  45. Wang, X., & Mccallum, A. (2006). Topics over time: A non-Markov continuous-time model of topical trends. In ACM SIGKDD international conference on knowledge discovery and data mining, 2006 (pp. 424–433).Google Scholar
  46. WP12, C. (2000). Cross concordances of classifications and thesauri.
  47. Wu, Q., Zhang, C., Hong, Q., & Chen, L. (2014a). Topic evolution based on LDA and HMM and its application in stem cell research. Journal of Information Science, 40(5), 611–620.Google Scholar
  48. Wu, Q. Q., Zhang, H. B., & Lan, J. (2015). K-State automaton burst detection model based on KOS: Emerging trends in cancer field. Journal of Information Science, 41(1), 16–26. Scholar
  49. Wu, Q. Q., Zheng, Y., She, Y., & An, X. (2014b). Emerging topic detection model based on LDA and its application in stem cell field. In IEEE international conference on computational science and engineering, 2014 (pp. 1939–1944).Google Scholar
  50. Xiang, Q., Yu, H., Ziyan, C., Xiaoyan, L., Jing, T., Tinglei, H., et al. (2014). BURST-LDA: A new topic model for detecting bursty topics from stream text. Journal of Electronics (China), 6, 565–575.Google Scholar
  51. Xiaohui, Q., & Xiaoqiu, L. (2015). Topic evolution research on a certain field based on LDA topic association filter. New Technology of Library and Information Service, 31(3), 18–25.Google Scholar
  52. Young, R. M., Jamshidi, A., Davis, G., & Sherman, J. H. (2015). Current trends in the surgical management and treatment of adult glioblastoma. Annals of Translational Medicine, 3(9), 121. Scholar
  53. Zeng, M. L. (2010). Knowledge organization systems (KOS). Proceedings of the American Society for Information Science and Technology, 44(1), 1–3.Google Scholar
  54. Zeng, M. L., & Chan, L. M. (2004). Trends and issues in establishing interoperability among knowledge organization systems. Journal of the Association for Information Science and Technology, 55(5), 377–395.Google Scholar
  55. Zheng, R., Zhao, H., & Zhang, X. (2015). A word similarity algorithm with sememe probability density ratio based on HowNet. International Journal of Hybrid Information Technology, 8, 417–426.Google Scholar

Copyright information

© Akadémiai Kiadó, Budapest, Hungary 2019

Authors and Affiliations

  1. 1.Software SchoolXiamen UniversityXiamenChina

Personalised recommendations