Chinese Journal of Integrative Medicine

, Volume 17, Issue 4, pp 307–313 | Cite as

Topic model for Chinese medicine diagnosis and prescription regularities analysis: Case on diabetes

  • Xiao-ping Zhang (张小平)
  • Xue-zhong Zhou (周雪忠)
  • Hou-kuan Huang (黄厚宽)
  • Qi Feng (冯 奇)
  • Shi-bo Chen (陈世波)
  • Bao-yan Liu (刘保延)
Thinking and Methodology


Induction of common knowledge or regularities from large-scale clinical data is a vital task for Chinese medicine (CM). In this paper, we propose a data mining method, called the Symptom-Herb-Diagnosis topic (SHDT) model, to automatically extract the common relationships among symptoms, herb combinations and diagnoses from large-scale CM clinical data. The SHDT model is one of the multi-relational extensions of the latent topic model, which can acquire topic structure from discrete corpora (such as document collection) by capturing the semantic relations among words. We applied the SHDT model to discover the common CM diagnosis and treatment knowledge for type 2 diabetes mellitus (T2DM) using 3 238 inpatient cases. We obtained meaningful diagnosis and treatment topics (clusters) from the data, which clinically indicated some important medical groups corresponding to comorbidity diseases (e.g., heart disease and diabetic kidney diseases in T2DM inpatients). The results show that manifestation sub-categories actually exist in T2DM patients that need specific, individualised CM therapies. Furthermore, the results demonstrate that this method is helpful for generating CM clinical guidelines for T2DM based on structured collected clinical data.


latent Dirichlet allocation Author-Topic model Dirichlet priori Chinese medicine Symptom-Herb-Diagnosis topic model 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Feng Y, Wu ZH, Zhou XZ, Zhou ZM, Fan WY. Knowledge discovery in traditional Chinese medicine: State of the art and perspectives. Artif Intell Med 2006;38:219–236.PubMedCrossRefGoogle Scholar
  2. 2.
    Lukman S, He Y, Hui SC. Computational methods for traditional Chinese medicine: A survey. Comput Methods Programs Biomed 2007;88:283–294.PubMedCrossRefGoogle Scholar
  3. 3.
    Zhou XZ, Liu BY, Wang YH, Zhang RS, Li P, Chen SB, et al. Building clinical data warehouse for traditional Chinese medicine knowledge discovery. In: Proceedings of the 2008 International Conference on BioMedical Engineering and informatics. Sanya, Hainan, China; 2008:31–36.Google Scholar
  4. 4.
    Mitchell TM. Machine learning and data mining. Commun ACM 1999;42 (11):31–36.Google Scholar
  5. 5.
    Zhou XZ, Liu BY, Wu ZH, Feng Y. Integrative mining of traditional Chinese medicine literature and MEDLINE for functional gene networks. Artif Intell Med 2007; 41:87–104.PubMedCrossRefGoogle Scholar
  6. 6.
    Zhou XZ, Liu BY. Traditional Chinese medicine clinical data mining: experiences and issues. In: AIBDM workshop of 13th PAKDD. Bangkok, Thailand; 2009:11–20.Google Scholar
  7. 7.
    Steyvers M, Griffiths T. Probabilistic topic models. In: Landauer T, McNamara D, Dennis S, Kintsch W, eds. The handbook of latent semantic analysis: a road to meaning. Hillsdale, New Jersey: Erlbaum; 2007:427–448.Google Scholar
  8. 8.
    Rosen-Zvi M, Griffiths T, Steyvers M, Smyth P. The author-topic model for authors and documents. In: Proceedings of the 20th conference on uncertainty in artificial intelligence. Virginia: AUAI Press; 2004:487–494.Google Scholar
  9. 9.
    Zhou XZ, Chen SB, Liu BY, Zhang RS, Wang YH, Li P, et al. Development of traditional Chinese medicine clinical data warehouse for medical knowledge discovery and decision support. Artif Intell Med 2010;48:139–152.PubMedCrossRefGoogle Scholar
  10. 10.
    Blei DM, Ng AY. Jordan MI. Latent Dirichlet allocation. J Machine Learn Res 2003;3:993–1022.CrossRefGoogle Scholar
  11. 11.
    McCallum A, Wang XR, Corrada-Emmanuel A. Topic and role discovery in social networks with experiments on enron and academic email. J Artif Intell Res 2007;30:249–272.Google Scholar
  12. 12.
    Wang XR, Mohanty N, McCallum A. Group and topic discovery from relations and their attributes. In: Weiss Y, Schölkopf B, and Platt J, eds. Advances in Neural Information Processing Systems 18. Cambridge, MA: MIT Press; 2006:1449–1456.Google Scholar
  13. 13.
    Erosheva E, Fienberg S, Lafferty J. Mixed-membership models of scientific publications. Proc Natl Acad Sci U S A 2004;101(Suppl 1):5220–5227.PubMedCrossRefGoogle Scholar
  14. 14.
    Mimno D, McCallum A. Expertise modeling for matching papers with reviewers. In: 13th ACM SIGKDD international conference on knowledge discovery and data mining. California, USA; 2007:500–509.Google Scholar
  15. 15.
    Zhang XP, Zhou XZ, Huang HK, Feng Q, Chen SB. Multi-relational topic model for cm clinical knowledge discovery. In: AIBDM workshop of 13th PAKDD. Bangkok, Thailand; 2009:31–39.Google Scholar
  16. 16.
    Gao Z, Po L, Jiang W, Zhao X, Dong H. A novel computerized method based on support vector machine for tongue diagnosis. In: Proceedings of the 3th International IEEE Conference on signal-image technologies and internet-based system.iShanghai, China; 2007:849–854.Google Scholar
  17. 17.
    Zhang Q, Zhang WT, Wei JJ, Wang XB, Liu P. Combined use of factor analysis and cluster analysis in classification of traditional Chinese medical syndromes in patients with posthepatitic cirrhosis. J Chin Integr Med 2005;3:14–18.CrossRefGoogle Scholar
  18. 18.
    Qin ZG, Mao ZY, Deng ZZ. The application of rough set in the Chinese medicine rheumatic arthritis diagnosis. Chin J Biomed Eng (Chin) 2001;20:357–363.Google Scholar
  19. 19.
    Wang XW, Qu HB, Liu P, Cheng YY. A self-learning expert system for diagnosis in traditional Chinese medicine. Expert Syst Appl 2004;26: 557–566.CrossRefGoogle Scholar
  20. 20.
    Deerwester SC, Dumais ST, Landauer TK, Furnas GW, Harshman RA. Indexing by latent semantic analysis. J Am Soc Inf Sci 1990;41:391–407.CrossRefGoogle Scholar
  21. 21.
    Hofmann T. Unsupervised learning by probabilistic latent semantic analysis. Machine Learn J 2001;42:177–196.CrossRefGoogle Scholar
  22. 22.
    American Diabetes Association. Report of the expert committee on the diagnosis and classification of diabetes mellitus. Diabetes Care 1997;25:s5–s20.Google Scholar
  23. 23.
    Zhou XZ, Peng YH, Liu BY. Text mining for traditional Chinese medical knowledge discovery: a survey. J Biomed Inf 2010;43:650–660.CrossRefGoogle Scholar
  24. 24.
    Pang B, Zhang D, Li N, Wang K. Computerized tongue diagnosis based on Bayesian networks. IEEE Trans Biomed Eng 2004;51:1803–1810.PubMedCrossRefGoogle Scholar
  25. 25.
    Zhang NL, Yuan SH, Chen T, Wang Y. Latent tree models and diagnosis in traditional Chinese Medicine. Artif Intell Med 2008;42:229–245.PubMedGoogle Scholar
  26. 26.
    Wu ZH, Zhou XZ, Liu BY, Cheng J. Text mining for finding functional community of related genes using CM knowledge. In: Boulicaut JF, Esposito F, Giannotti F, et al, eds. Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases. Berlin: Springer-Verlag; 2004:459–470.Google Scholar
  27. 27.
    Li C, Tang CJ, Peng J, Hu JJ. NNF: An effective approach in medicine paring analysis of traditional Chinese medicine prescriptions. In: Zhou L, Ooi B, Meng X, eds. Proceedings of DASFAA 2005, lecture notes in computer science 3453. Berlin: Springer-Verlag; 2005:576–581.Google Scholar
  28. 28.
    Yao MC, Ai L, Yuan YM, Qiao YJ. Analysis of the association rule in the composition of the cm formulas for diabetes. J Beijing Univ Chin Med (Chin) 2002;25 (6):48–50.Google Scholar
  29. 29.
    Zhou ZM, Wu ZH, Wang CS, Feng Y. Mining both associated and correlated patterns. In: Alexandrov VN, et al, eds. Proceedings of ICCS, lecture notes in computer science 3994. Berlin: Springer-Verlag; 2006:468–475.Google Scholar
  30. 30.
    Deng K, Liu DL, Gao S, Geng Z. Structural learning of graphical models and its applications to traditional Chinese medicine. In: Wang L, Jin Y, eds. Proceedings of FSKD, lecture notes in computer science 3614. Berlin: Springer-Verlag; 2005:362–367.Google Scholar
  31. 31.
    Zhou XZ, Liu BY, Wu ZH. Text mining for clinical Chinese herbal medical knowledge discovery. In: Hoffmann AG, Motoda H, Scheffer T, eds. Discovery science, lecture notes in computer science 3735. Berlin: Springer-Verlag; 2005:396–398.Google Scholar
  32. 32.
    Cao C, Wang H, Sui Y. Knowledge modeling and acquisition of traditional Chinese herbal drugs and formulae from text. Artifl Intell Med 2004;32:3–13.CrossRefGoogle Scholar
  33. 33.
    Zimmet P. Globalization, coca-colonization and the chronic disease epidemic: can the doomsday scenario be averted? J Int Med 2000;247:301–310.CrossRefGoogle Scholar
  34. 34.
    Covington MB. Traditional Chinese medicine in the treatment of diabetes. Diabetes Spectrum 2001;14: 154–159.CrossRefGoogle Scholar

Copyright information

© Chinese Association of the Integration of Traditional and Western Medicine and Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Xiao-ping Zhang (张小平)
    • 1
  • Xue-zhong Zhou (周雪忠)
    • 1
  • Hou-kuan Huang (黄厚宽)
    • 1
  • Qi Feng (冯 奇)
    • 1
  • Shi-bo Chen (陈世波)
    • 2
  • Bao-yan Liu (刘保延)
    • 3
  1. 1.School of Computer and Information TechnologyBeijing Jiaotong UniversityBeijingChina
  2. 2.Guang’anmen HospitalChina Academy of Chinese Medical SciencesBeijingChina
  3. 3.China Academy of Chinese Medical SciencesBeijingChina

Personalised recommendations