International Journal of Fuzzy Systems

, Volume 20, Issue 4, pp 1334–1345 | Cite as

Fuzzy Approach Topic Discovery in Health and Medical Corpora

  • Amir Karami
  • Aryya Gangopadhyay
  • Bin Zhou
  • Hadi Kharrazi


The majority of medical documents and electronic health records are in text format that poses a challenge for data processing and finding relevant documents. Looking for ways to automatically retrieve the enormous amount of health and medical knowledge has always been an intriguing topic. Powerful methods have been developed in recent years to make the text processing automatic. One of the popular approaches to retrieve information based on discovering the themes in health and medical corpora is topic modeling; however, this approach still needs new perspectives. In this research, we describe fuzzy latent semantic analysis (FLSA), a novel approach in topic modeling using fuzzy perspective. FLSA can handle health and medical corpora redundancy issue and provides a new method to estimate the number of topics. The quantitative evaluations show that FLSA produces superior performance and features to latent Dirichlet allocation, the most popular topic model.


Text mining Topic model Medical Health Fuzzy approach 


  1. 1.
    Aggarwal, C.C., Zhai, C. (eds.): An introduction to text mining. In: Mining Text Data, pp. 1–10. Springer, Boston, MA (2012)Google Scholar
  2. 2.
    Arnold, C., Speier, W.: A topic model of clinical reports. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1031–1032. ACM (2012)Google Scholar
  3. 3.
    Arnold, C.W., El-Saden, S.M., Bui, A.A., Taira, R.: Clinical case-based retrieval using latent topic analysis. In: AMIA Annual Symposium Proceedings, vol. 2010, p. 26. American Medical Informatics Association (2010)Google Scholar
  4. 4.
    Asou, T., Eguchi, K.: Predicting protein-protein relationships from literature using collapsed variational latent dirichlet allocation. In: Proceedings of the 2nd International Workshop on Data and Text Mining in Bioinformatics, pp. 77–80. ACM (2008)Google Scholar
  5. 5.
    Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers, Dordrecht (1981)CrossRefzbMATHGoogle Scholar
  6. 6.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  7. 7.
    Chen, J.H., Goldstein, M.K., Asch, S.M., Mackey, L., Altman, R.B.: Predicting inpatient clinical order patterns with probabilistic topic models vs conventional order sets. J. Am. Med. Inform. Assoc. 24(3), ocw136 (2016)Google Scholar
  8. 8.
    Cohen, R., Elhadad, M., Elhadad, N.: Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies. BMC Bioinform. 14(1), 10 (2013)CrossRefGoogle Scholar
  9. 9.
    Cohen, R., Aviram, I., Elhadad, M., Elhadad, N.: Redundancy-aware topic modeling for patient record notes. PloS ONE 9(2), e87555 (2014)CrossRefGoogle Scholar
  10. 10.
    Council, N.: Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017–2020: Interim Report, 2016. The National Academies Press, Washington, DC (2016)Google Scholar
  11. 11.
    Dawson, J.A., Kendziorski, C.: Survival-supervised latent dirichlet allocation models for genomic analysis of time-to-event outcomes. arXiv preprint arXiv:1202.5999 (2012)
  12. 12.
    Defossez, G., Rollet, A., Dameron, O., Ingrand, P.: Temporal representation of care trajectories of cancer patients using data from a regional information system: an application in breast cancer. BMC Med. Inform. Decis. Mak. 14(1), 24 (2014)CrossRefGoogle Scholar
  13. 13.
    Di Lascio, L., Gisolfi, A., Albunia, A., Galardi, G., Meschi, F.: A fuzzy-based methodology for the analysis of diabetic neuropathy. Fuzzy Sets Syst. 129(2), 203–228 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Downey, D., Etzioni, O., Soderland, S.: A probabilistic model of redundancy in information extraction. Technical Report, DTIC Document (2006)Google Scholar
  15. 15.
    Dumais, S.T.: Enhancing performance in latent semantic indexing (lsi) retrieval. Technical Report TM-ARH- 017527, Bellcore, Morristown, NJ (1992)Google Scholar
  16. 16.
    Fodor, I.K.: A survey of dimension reduction techniques. (2002)
  17. 17.
    Gasch, A.P., Eisen, M.B.: Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biol. 3(11), 1–22 (2002)CrossRefGoogle Scholar
  18. 18.
    Ghassemi, M., Naumann, T., Doshi-Velez, F., Brimmer, N., Joshi, R., Rumshisky, A., Szolovits, P.: Unfolding physiological state: mortality modelling in intensive care units. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 75–84. ACM (2014)Google Scholar
  19. 19.
    Hassanien, A.E.: Intelligent data analysis of breast cancer based on rough set theory. Int. J. Artif. Intell. Tools 12(04), 465–479 (2003)CrossRefGoogle Scholar
  20. 20.
    Helgason, C.M., Jobe, T.H.: The fuzzy cube and causal efficacy: representation of concomitant mechanisms in stroke. Neural Netw. 11(3), 549–555 (1998)CrossRefGoogle Scholar
  21. 21.
    Helgason, C.M., Jobe, T.H.: Causal interactions, fuzzy sets and cerebrovascular accident: the limits of evidence-based medicine and the advent of complexity-based medicine. Neuroepidemiology 18(2), 64–74 (1999)CrossRefGoogle Scholar
  22. 22.
    Helgason, C.M., Malik, D., Cheng, S.-C., Jobe, T.H., Mordeson, J.N.: Statistical versus fuzzy measures of variable interaction in patients with stroke. Neuroepidemiology 20(2), 77–84 (2001)CrossRefGoogle Scholar
  23. 23.
    Hotho, A., Nürnberger, A., Paaß, G.: A brief survey of text mining. In: Ldv Forum, vol. 20, pp. 19–62 (2005)Google Scholar
  24. 24.
    Huang, Z., Dong, W., Duan, H., Li, H.: Similarity measure between patient traces for clinical pathway analysis: problem, method, and applications. IEEE J. Biomed. Health Inform. 18(1), 4–14 (2014)CrossRefGoogle Scholar
  25. 25.
    Jolliffe, I.: Principal Component Analysis. Wiley, New York (2002)zbMATHGoogle Scholar
  26. 26.
    Karami, A.: Fuzzy Topic Modeling for Medical Corpora. Ph.D. Thesis, University of Maryland, Baltimore County (2015)Google Scholar
  27. 27.
    Karami, A., Gangopadhyay, A.: Fftm: a fuzzy feature transformation method for medical documents. In: Proceedings of the Conference of the Association for Computational Linguistics (ACL), vol. 128 (2014)Google Scholar
  28. 28.
    Karami, A., Guo, Z.: A fuzzy logic multi-criteria decision framework for selecting it service providers. In: Proceedings of the Hawaii International Conference on System Science (HICSS), pp. 1118–1127. IEEE (2012)Google Scholar
  29. 29.
    Karami, A., Zhou, B.: Online review spam detection by new linguistic features. In: iConference 2015 Proceedings (2015)Google Scholar
  30. 30.
    Karami, A., Zhou, L.: Exploiting latent content based features for the detection of static sms spams. In: The 77th Annual Meeting of the Association for Information Science and Technology (ASIST) (2014a)Google Scholar
  31. 31.
    Karami, A., Zhou, L.: Improving static sms spam detection by using new content-based features. In: The 20th Americas Conference on Information Systems (AMCIS) (2014b)Google Scholar
  32. 32.
    Karami, A., Yazdani, H.R., Beiryaie, H.S., Hosseinzadeh, N.: A risk based model for is outsourcing vendor selection. In: 2nd IEEE International Conference on Information and Financial Engineering (ICIFE), pp. 250–254. IEEE (2010)Google Scholar
  33. 33.
    Karami, A., Gangopadhyay, A., Zhou, B., Kharrazi, H.: Flatm: A fuzzy logic approach topic model for medical documents. In: Proceedings of the Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS). IEEE (2015a)Google Scholar
  34. 34.
    Karami, A., Gangopadhyay, A., Zhou, B., Kharrazi, H.: A fuzzy approach model for uncovering hidden latent semantic structure in medical text collections. In: Proceedings of the iConference (2015b)Google Scholar
  35. 35.
    Keller, J., Krisnapuram, R., Pal, N.R.: Fuzzy Models and Algorithms for Pattern Recognition and Image Processing, vol. 4. Springer, Berlin (2005)zbMATHGoogle Scholar
  36. 36.
    Kohonen, T.: The self-organizing map. Proc. IEEE 78(9), 1464–1480 (1990)CrossRefGoogle Scholar
  37. 37.
    Kolda, T.G.: Limited-memory matrix methods with applications. (1998)
  38. 38.
    Mitchell, T.M.: Machine learning. WCB, McGraw-Hill Boston, MA (1997)Google Scholar
  39. 39.
    Moon, W.K., Chang, S.-C., Huang, C.-S., Chang, R.-F.: Breast tumor classification using fuzzy clustering for breast elastography. Ultrasound Med. Biol. 37(5), 700–708 (2011)CrossRefGoogle Scholar
  40. 40.
    Naranjo, C.A., Bremner, K.E., Bazoon, M., Turksen, I.B.: Using fuzzy logic to predict response to citalopram in alcohol dependence. Clin. Pharm. Ther. 62(2), 209–224 (1997)CrossRefGoogle Scholar
  41. 41.
    Papageorgiou, E.I., Stylios, C.D., Groumpos, P.P.: An integrated two-level hierarchical system for decision making in radiation therapy based on fuzzy cognitive maps. IEEE Trans. Biomed. Eng. 50(12), 1326–1339 (2003)CrossRefGoogle Scholar
  42. 42.
    Papineni, K.: Why inverse document frequency? In: Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies, pp. 1–8. Association for Computational Linguistics (2001)Google Scholar
  43. 43.
    Pivovarov, R., Perotte, A.J., Grave, E., Angiolillo, J., Wiggins, C.H., Elhadad, N.: Learning probabilistic phenotypes from heterogeneous ehr data. J. Biomed. Inform. 58, 156–165 (2015)CrossRefGoogle Scholar
  44. 44.
    Qi, Y., Bar-Joseph, Z., Klein-Seetharaman, J.: Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins Struct. Funct. Bioinform. 63(3), 490–500 (2006)CrossRefGoogle Scholar
  45. 45.
    Rendón, E., Abundez, I., Arizmendi, A., Quiroz, E.M.: Internal versus external cluster validation indexes. Int. J. Comput. Commun. 5(1), 27–34 (2011)Google Scholar
  46. 46.
    Sarioglu, E., Choi, H.-A., Yadav, K.: Clinical report classification using natural language processing and topic modeling. In: Proceedings of the IEEE International Conference on Machine Learning and Applications (ICMLA), vol. 2, pp. 204–209 (2012)Google Scholar
  47. 47.
    Wrenn, J.O., Stein, D.M., Bakken, S., Stetson, P.D.: Quantifying clinical narrative redundancy in an electronic health record. J. Am. Med. Inform. Assoc. 17(1), 49–53 (2010)CrossRefGoogle Scholar
  48. 48.
    Xu, H., Wang, J., Hua, X.-S., Li, S.: Tag refinement by regularized lda. In: Proceedings of the 17th ACM International Conference on Multimedia, pp. 573–576. ACM (2009)Google Scholar
  49. 49.
    Zadeh, L.A.: Outline of a new approach to the analysis of complex systems and decision processes. IEEE Trans. Syst. Man Cybern. 1, 28–44 (1973)MathSciNetCrossRefzbMATHGoogle Scholar
  50. 50.
    Zahlmann, G., Kochner, B., Ugi, I., Schuhmann, D., Liesenfeld, B., Wegner, A., Obermaier, M., Mertz, M.: Hybrid fuzzy image processing for situation assessment [diabetic retinopathy]. IEEE Eng. Med. Biol. Mag. 19(1), 76–83 (2000)CrossRefGoogle Scholar
  51. 51.
    Zeng, J., Liu, Z.-Q., Cao, X.-Q.: A new approach to speeding up topic modeling. arXiv preprint arXiv:1204.0170 (2012)
  52. 52.
    Zimmermann, H.-J.: Fuzzy set theory. Wiley Interdiscip. Rev. Comput. Stat. 2(3), 317–332 (2010)CrossRefGoogle Scholar

Copyright information

© Taiwan Fuzzy Systems Association and Springer-Verlag Berlin Heidelberg 2017

Authors and Affiliations

  1. 1.School of Library and Information ScienceUniversity of South CarolinaColumbiaUSA
  2. 2.Information Systems DepartmentUniversity of Maryland Baltimore CountyBaltimoreUSA
  3. 3.Bloomberge School of Public HealthJohns Hopkins UniversityBaltimoreUSA

Personalised recommendations