Skip to main content

Latent semantic diagnosis in traditional chinese medicine


Traditional Chinese Medicine (TCM) is the main route of disease control for ancient Chinese. Through thousands of years’ development and inheriting, TCM is the most influential traditional medical system which lasts the longest time and used by the largest population. However, there are still much space for data driven TCM information process to take advantage of for real medical application. In this paper, we propose a statistical diagnosis approach to find out the pathogenesises based on the latent semantic analysis of symptoms and the corresponding herbs. We assume that the latent pathogenesis is the inherent connection between symptoms and herbs within a medical case. We therefore develop a novel multi-content model based on LDA. Then three prescription recommendation algorithms are proposed focusing on permanent cure, symptom alleviation and both. We used the proposed model to analyze two TCM domains amenorrhea and lung cancer. Experiment results illustrate that the pathogenesises found by our model correspond well with the theory of TCM and the proposed model provides a theoretical data-driven way to establish diagnosis standards. And the prescription recommendation algorithms help doctor make treatment more accurately, which can lead the development of diagnosis of TCM.

This is a preview of subscription content, access via your institution.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8


  1. Brown, P.F., et al.: An estimate of an upper bound for the entropy of English. Comput. Linguist. 18.1, 31–40 (1992)

    Google Scholar 

  2. Blei, D.M., Ng, A.Y., Michael, I.: Jordan, Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  3. Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Comput. Linguist. 16.1, 22–29 (1990)

    Google Scholar 

  4. Erosheva, E., Fienberg, S., Lafferty, J.: Mixed-membership models of scientific publications. In: Proceedings of the National Academy of Sciences, pp 5220–5227 (2004)

  5. Heinrich, G.: Parameter estimation for text analysis Technical report (2005)

  6. Ji, W., et al.: Latent Semantic Diagnosis in Traditional Chinese Medicine, Asia-Pacific Web Conference Springer International Publishing (2016)

  7. Li, Shizhen: Compendium of Materia Medica:(Bencao Gangmu) Foreign Languages Press (2003)

  8. Lei, L.: Study on Application of Probability Latent Semantic Analysis (PLSA) in Herbal Prescription Development,009 (2012)

  9. Liu, Y., Niculescu-Mizil, A., Gryc, W.: Topic-link LDA: joint models of topic and author community. In: Proceedings of the 26th annual international conference on machine learning, pp 665–672. ACM (2009)

  10. Murphy, K.P.: Machine learning: a probabilistic perspective MIT press (2012)

  11. Rabinovich, M., Blei, D.: The Inverse Regression Topic Model. In: Proceedings of The 31st International Conference on Machine Learning, pp 199–207 (2014)

  12. Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proceedings of the 20th conference on Uncertainty in artificial intelligence, pp 487–494. AUAI Press (2004)

  13. Thomas, H.: Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval ACM (1999)

  14. Ying, J: Collection and analysis of characteristics of tongue manifestations in patients with cerebrovascular diseases. J Beijing Univ Tradit Chin Med 28.4, 62–6 (2005)

    Google Scholar 

  15. Yin, H., Sun, Y., Cui, B., Hu, Z., Chen, L.: LCARS: a location-content-aware recommender system. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 221–229. ACM (2013)

  16. Yin, H., Cui, B., Sun, Y., Hu, Z., Chen, L.: LCARS A spatial item recommender system. ACM Trans. Inf. Syst. (TOIS) 32(3), 11 (2014)

    Article  Google Scholar 

  17. Yin, H., Cui, B., Huang, Z., Wang, W., Wu, X., Zhou, X.: Joint modeling of users’ interests and mobility patterns for point-of-interest recommendation. In: Proceedings of the 23rd ACM international conference on Multimedia, pp 819–822. ACM (2015)

  18. Yin, H., Cui, B., Zhou, X., Wang, W., Huang, Z., Sadiq, S.: Joint modeling of user check-in behaviors for real-time point-of-interest recommendation. ACM Trans. Inf. Syst. (TOIS) 35(2), 11 (2016)

    Article  Google Scholar 

  19. Zhang, N.L., et al.: Latent tree models and diagnosis in traditional Chinese medicine. Artif. Intell. Med. 42.3, 229–245 (2008)

    Article  Google Scholar 

  20. Zhou, X., Liu, B., Zhaohui, W.: Text mining for clinical Chinese herbal medical knowledge discovery. Discovery Science, Springer, Berlin Heidelberg (2005)

    Book  Google Scholar 

  21. Zhou, X., Peng, Y., Liu, B: Text mining for traditional Chinese medical knowledge discovery: a survey. J. Biomed. Inform. 43.4, 650–660 (2010)

    Article  Google Scholar 

Download references


This work was supported by NSFC grants (No. 61472141 and 61532021), Shanghai Knowledge Service Platform Project (No. ZF1213), and Shanghai Agriculture Applied Technology Development Program (Grant No. T20150302).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Xiaoling Wang.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ji, W., Zhang, Y., Wang, X. et al. Latent semantic diagnosis in traditional chinese medicine. World Wide Web 20, 1071–1087 (2017).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Latent semantic model
  • Traditional chinese medicine
  • Recommondation