Skip to main content
Log in

Fuzzy Approach Topic Discovery in Health and Medical Corpora

  • Published:
International Journal of Fuzzy Systems Aims and scope Submit manuscript

Abstract

The majority of medical documents and electronic health records are in text format that poses a challenge for data processing and finding relevant documents. Looking for ways to automatically retrieve the enormous amount of health and medical knowledge has always been an intriguing topic. Powerful methods have been developed in recent years to make the text processing automatic. One of the popular approaches to retrieve information based on discovering the themes in health and medical corpora is topic modeling; however, this approach still needs new perspectives. In this research, we describe fuzzy latent semantic analysis (FLSA), a novel approach in topic modeling using fuzzy perspective. FLSA can handle health and medical corpora redundancy issue and provides a new method to estimate the number of topics. The quantitative evaluations show that FLSA produces superior performance and features to latent Dirichlet allocation, the most popular topic model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://www.ncbi.nlm.nih.gov/pubmed.

  2. http://www.mckinsey.com/industries/healthcare-systems-and-services/our-insights/the-big-data-revolution-in-us-health-care.

  3. http://www.cs.umass.edu/~wallach/code/etm/.

  4. http://www.mathworks.com/help/fuzzy/fcm.html.

  5. http://www.cs.waikato.ac.nz/ml/weka/.

  6. http://mallet.cs.umass.edu/.

  7. https://sourceforge.net/projects/redlda/.

  8. https://sites.google.com/site/karamihomepage/.

  9. https://github.com/amir-karami.

  10. http://muchmore.dfki.de/resources1.htm.

  11. http://physionet.org/.

  12. http://disi.unitn.it/moschitti/corpora/ohsumed-first-20000-docs.tar.gz.

  13. www.twitter.com.

  14. https://github.com/amir-karami/Health-News-Tweets-Data.

  15. https://sourceforge.net/projects/corpusredundanc/files/?source=navbar.

  16. https://www.cs.princeton.edu/courses/archive/spring07/cos424/scribe_notes/0306.pdf.

References

  1. Aggarwal, C.C., Zhai, C. (eds.): An introduction to text mining. In: Mining Text Data, pp. 1–10. Springer, Boston, MA (2012)

  2. Arnold, C., Speier, W.: A topic model of clinical reports. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1031–1032. ACM (2012)

  3. Arnold, C.W., El-Saden, S.M., Bui, A.A., Taira, R.: Clinical case-based retrieval using latent topic analysis. In: AMIA Annual Symposium Proceedings, vol. 2010, p. 26. American Medical Informatics Association (2010)

  4. Asou, T., Eguchi, K.: Predicting protein-protein relationships from literature using collapsed variational latent dirichlet allocation. In: Proceedings of the 2nd International Workshop on Data and Text Mining in Bioinformatics, pp. 77–80. ACM (2008)

  5. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers, Dordrecht (1981)

    Book  MATH  Google Scholar 

  6. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  7. Chen, J.H., Goldstein, M.K., Asch, S.M., Mackey, L., Altman, R.B.: Predicting inpatient clinical order patterns with probabilistic topic models vs conventional order sets. J. Am. Med. Inform. Assoc. 24(3), ocw136 (2016)

  8. Cohen, R., Elhadad, M., Elhadad, N.: Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies. BMC Bioinform. 14(1), 10 (2013)

    Article  Google Scholar 

  9. Cohen, R., Aviram, I., Elhadad, M., Elhadad, N.: Redundancy-aware topic modeling for patient record notes. PloS ONE 9(2), e87555 (2014)

    Article  Google Scholar 

  10. Council, N.: Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017–2020: Interim Report, 2016. The National Academies Press, Washington, DC (2016)

    Google Scholar 

  11. Dawson, J.A., Kendziorski, C.: Survival-supervised latent dirichlet allocation models for genomic analysis of time-to-event outcomes. arXiv preprint arXiv:1202.5999 (2012)

  12. Defossez, G., Rollet, A., Dameron, O., Ingrand, P.: Temporal representation of care trajectories of cancer patients using data from a regional information system: an application in breast cancer. BMC Med. Inform. Decis. Mak. 14(1), 24 (2014)

    Article  Google Scholar 

  13. Di Lascio, L., Gisolfi, A., Albunia, A., Galardi, G., Meschi, F.: A fuzzy-based methodology for the analysis of diabetic neuropathy. Fuzzy Sets Syst. 129(2), 203–228 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  14. Downey, D., Etzioni, O., Soderland, S.: A probabilistic model of redundancy in information extraction. Technical Report, DTIC Document (2006)

  15. Dumais, S.T.: Enhancing performance in latent semantic indexing (lsi) retrieval. Technical Report TM-ARH- 017527, Bellcore, Morristown, NJ (1992)

  16. Fodor, I.K.: A survey of dimension reduction techniques. http://www.osti.gov/energycitations/product.biblio.jsp?osti_id=15002155 (2002)

  17. Gasch, A.P., Eisen, M.B.: Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biol. 3(11), 1–22 (2002)

    Article  Google Scholar 

  18. Ghassemi, M., Naumann, T., Doshi-Velez, F., Brimmer, N., Joshi, R., Rumshisky, A., Szolovits, P.: Unfolding physiological state: mortality modelling in intensive care units. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 75–84. ACM (2014)

  19. Hassanien, A.E.: Intelligent data analysis of breast cancer based on rough set theory. Int. J. Artif. Intell. Tools 12(04), 465–479 (2003)

    Article  Google Scholar 

  20. Helgason, C.M., Jobe, T.H.: The fuzzy cube and causal efficacy: representation of concomitant mechanisms in stroke. Neural Netw. 11(3), 549–555 (1998)

    Article  Google Scholar 

  21. Helgason, C.M., Jobe, T.H.: Causal interactions, fuzzy sets and cerebrovascular accident: the limits of evidence-based medicine and the advent of complexity-based medicine. Neuroepidemiology 18(2), 64–74 (1999)

    Article  Google Scholar 

  22. Helgason, C.M., Malik, D., Cheng, S.-C., Jobe, T.H., Mordeson, J.N.: Statistical versus fuzzy measures of variable interaction in patients with stroke. Neuroepidemiology 20(2), 77–84 (2001)

    Article  Google Scholar 

  23. Hotho, A., Nürnberger, A., Paaß, G.: A brief survey of text mining. In: Ldv Forum, vol. 20, pp. 19–62 (2005)

  24. Huang, Z., Dong, W., Duan, H., Li, H.: Similarity measure between patient traces for clinical pathway analysis: problem, method, and applications. IEEE J. Biomed. Health Inform. 18(1), 4–14 (2014)

    Article  Google Scholar 

  25. Jolliffe, I.: Principal Component Analysis. Wiley, New York (2002)

    MATH  Google Scholar 

  26. Karami, A.: Fuzzy Topic Modeling for Medical Corpora. Ph.D. Thesis, University of Maryland, Baltimore County (2015)

  27. Karami, A., Gangopadhyay, A.: Fftm: a fuzzy feature transformation method for medical documents. In: Proceedings of the Conference of the Association for Computational Linguistics (ACL), vol. 128 (2014)

  28. Karami, A., Guo, Z.: A fuzzy logic multi-criteria decision framework for selecting it service providers. In: Proceedings of the Hawaii International Conference on System Science (HICSS), pp. 1118–1127. IEEE (2012)

  29. Karami, A., Zhou, B.: Online review spam detection by new linguistic features. In: iConference 2015 Proceedings (2015)

  30. Karami, A., Zhou, L.: Exploiting latent content based features for the detection of static sms spams. In: The 77th Annual Meeting of the Association for Information Science and Technology (ASIST) (2014a)

  31. Karami, A., Zhou, L.: Improving static sms spam detection by using new content-based features. In: The 20th Americas Conference on Information Systems (AMCIS) (2014b)

  32. Karami, A., Yazdani, H.R., Beiryaie, H.S., Hosseinzadeh, N.: A risk based model for is outsourcing vendor selection. In: 2nd IEEE International Conference on Information and Financial Engineering (ICIFE), pp. 250–254. IEEE (2010)

  33. Karami, A., Gangopadhyay, A., Zhou, B., Kharrazi, H.: Flatm: A fuzzy logic approach topic model for medical documents. In: Proceedings of the Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS). IEEE (2015a)

  34. Karami, A., Gangopadhyay, A., Zhou, B., Kharrazi, H.: A fuzzy approach model for uncovering hidden latent semantic structure in medical text collections. In: Proceedings of the iConference (2015b)

  35. Keller, J., Krisnapuram, R., Pal, N.R.: Fuzzy Models and Algorithms for Pattern Recognition and Image Processing, vol. 4. Springer, Berlin (2005)

    MATH  Google Scholar 

  36. Kohonen, T.: The self-organizing map. Proc. IEEE 78(9), 1464–1480 (1990)

    Article  Google Scholar 

  37. Kolda, T.G.: Limited-memory matrix methods with applications. http://hdl.handle.net/1903/483 (1998)

  38. Mitchell, T.M.: Machine learning. WCB, McGraw-Hill Boston, MA (1997)

  39. Moon, W.K., Chang, S.-C., Huang, C.-S., Chang, R.-F.: Breast tumor classification using fuzzy clustering for breast elastography. Ultrasound Med. Biol. 37(5), 700–708 (2011)

    Article  Google Scholar 

  40. Naranjo, C.A., Bremner, K.E., Bazoon, M., Turksen, I.B.: Using fuzzy logic to predict response to citalopram in alcohol dependence. Clin. Pharm. Ther. 62(2), 209–224 (1997)

    Article  Google Scholar 

  41. Papageorgiou, E.I., Stylios, C.D., Groumpos, P.P.: An integrated two-level hierarchical system for decision making in radiation therapy based on fuzzy cognitive maps. IEEE Trans. Biomed. Eng. 50(12), 1326–1339 (2003)

    Article  Google Scholar 

  42. Papineni, K.: Why inverse document frequency? In: Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies, pp. 1–8. Association for Computational Linguistics (2001)

  43. Pivovarov, R., Perotte, A.J., Grave, E., Angiolillo, J., Wiggins, C.H., Elhadad, N.: Learning probabilistic phenotypes from heterogeneous ehr data. J. Biomed. Inform. 58, 156–165 (2015)

    Article  Google Scholar 

  44. Qi, Y., Bar-Joseph, Z., Klein-Seetharaman, J.: Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins Struct. Funct. Bioinform. 63(3), 490–500 (2006)

    Article  Google Scholar 

  45. Rendón, E., Abundez, I., Arizmendi, A., Quiroz, E.M.: Internal versus external cluster validation indexes. Int. J. Comput. Commun. 5(1), 27–34 (2011)

    Google Scholar 

  46. Sarioglu, E., Choi, H.-A., Yadav, K.: Clinical report classification using natural language processing and topic modeling. In: Proceedings of the IEEE International Conference on Machine Learning and Applications (ICMLA), vol. 2, pp. 204–209 (2012)

  47. Wrenn, J.O., Stein, D.M., Bakken, S., Stetson, P.D.: Quantifying clinical narrative redundancy in an electronic health record. J. Am. Med. Inform. Assoc. 17(1), 49–53 (2010)

    Article  Google Scholar 

  48. Xu, H., Wang, J., Hua, X.-S., Li, S.: Tag refinement by regularized lda. In: Proceedings of the 17th ACM International Conference on Multimedia, pp. 573–576. ACM (2009)

  49. Zadeh, L.A.: Outline of a new approach to the analysis of complex systems and decision processes. IEEE Trans. Syst. Man Cybern. 1, 28–44 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  50. Zahlmann, G., Kochner, B., Ugi, I., Schuhmann, D., Liesenfeld, B., Wegner, A., Obermaier, M., Mertz, M.: Hybrid fuzzy image processing for situation assessment [diabetic retinopathy]. IEEE Eng. Med. Biol. Mag. 19(1), 76–83 (2000)

    Article  Google Scholar 

  51. Zeng, J., Liu, Z.-Q., Cao, X.-Q.: A new approach to speeding up topic modeling. arXiv preprint arXiv:1204.0170 (2012)

  52. Zimmermann, H.-J.: Fuzzy set theory. Wiley Interdiscip. Rev. Comput. Stat. 2(3), 317–332 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amir Karami.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Karami, A., Gangopadhyay, A., Zhou, B. et al. Fuzzy Approach Topic Discovery in Health and Medical Corpora. Int. J. Fuzzy Syst. 20, 1334–1345 (2018). https://doi.org/10.1007/s40815-017-0327-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40815-017-0327-9

Keywords

Navigation