Abstract
The majority of medical documents and electronic health records are in text format that poses a challenge for data processing and finding relevant documents. Looking for ways to automatically retrieve the enormous amount of health and medical knowledge has always been an intriguing topic. Powerful methods have been developed in recent years to make the text processing automatic. One of the popular approaches to retrieve information based on discovering the themes in health and medical corpora is topic modeling; however, this approach still needs new perspectives. In this research, we describe fuzzy latent semantic analysis (FLSA), a novel approach in topic modeling using fuzzy perspective. FLSA can handle health and medical corpora redundancy issue and provides a new method to estimate the number of topics. The quantitative evaluations show that FLSA produces superior performance and features to latent Dirichlet allocation, the most popular topic model.
Similar content being viewed by others
Notes
References
Aggarwal, C.C., Zhai, C. (eds.): An introduction to text mining. In: Mining Text Data, pp. 1–10. Springer, Boston, MA (2012)
Arnold, C., Speier, W.: A topic model of clinical reports. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1031–1032. ACM (2012)
Arnold, C.W., El-Saden, S.M., Bui, A.A., Taira, R.: Clinical case-based retrieval using latent topic analysis. In: AMIA Annual Symposium Proceedings, vol. 2010, p. 26. American Medical Informatics Association (2010)
Asou, T., Eguchi, K.: Predicting protein-protein relationships from literature using collapsed variational latent dirichlet allocation. In: Proceedings of the 2nd International Workshop on Data and Text Mining in Bioinformatics, pp. 77–80. ACM (2008)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers, Dordrecht (1981)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Chen, J.H., Goldstein, M.K., Asch, S.M., Mackey, L., Altman, R.B.: Predicting inpatient clinical order patterns with probabilistic topic models vs conventional order sets. J. Am. Med. Inform. Assoc. 24(3), ocw136 (2016)
Cohen, R., Elhadad, M., Elhadad, N.: Redundancy in electronic health record corpora: analysis, impact on text mining performance and mitigation strategies. BMC Bioinform. 14(1), 10 (2013)
Cohen, R., Aviram, I., Elhadad, M., Elhadad, N.: Redundancy-aware topic modeling for patient record notes. PloS ONE 9(2), e87555 (2014)
Council, N.: Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017–2020: Interim Report, 2016. The National Academies Press, Washington, DC (2016)
Dawson, J.A., Kendziorski, C.: Survival-supervised latent dirichlet allocation models for genomic analysis of time-to-event outcomes. arXiv preprint arXiv:1202.5999 (2012)
Defossez, G., Rollet, A., Dameron, O., Ingrand, P.: Temporal representation of care trajectories of cancer patients using data from a regional information system: an application in breast cancer. BMC Med. Inform. Decis. Mak. 14(1), 24 (2014)
Di Lascio, L., Gisolfi, A., Albunia, A., Galardi, G., Meschi, F.: A fuzzy-based methodology for the analysis of diabetic neuropathy. Fuzzy Sets Syst. 129(2), 203–228 (2002)
Downey, D., Etzioni, O., Soderland, S.: A probabilistic model of redundancy in information extraction. Technical Report, DTIC Document (2006)
Dumais, S.T.: Enhancing performance in latent semantic indexing (lsi) retrieval. Technical Report TM-ARH- 017527, Bellcore, Morristown, NJ (1992)
Fodor, I.K.: A survey of dimension reduction techniques. http://www.osti.gov/energycitations/product.biblio.jsp?osti_id=15002155 (2002)
Gasch, A.P., Eisen, M.B.: Exploring the conditional coregulation of yeast gene expression through fuzzy k-means clustering. Genome Biol. 3(11), 1–22 (2002)
Ghassemi, M., Naumann, T., Doshi-Velez, F., Brimmer, N., Joshi, R., Rumshisky, A., Szolovits, P.: Unfolding physiological state: mortality modelling in intensive care units. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 75–84. ACM (2014)
Hassanien, A.E.: Intelligent data analysis of breast cancer based on rough set theory. Int. J. Artif. Intell. Tools 12(04), 465–479 (2003)
Helgason, C.M., Jobe, T.H.: The fuzzy cube and causal efficacy: representation of concomitant mechanisms in stroke. Neural Netw. 11(3), 549–555 (1998)
Helgason, C.M., Jobe, T.H.: Causal interactions, fuzzy sets and cerebrovascular accident: the limits of evidence-based medicine and the advent of complexity-based medicine. Neuroepidemiology 18(2), 64–74 (1999)
Helgason, C.M., Malik, D., Cheng, S.-C., Jobe, T.H., Mordeson, J.N.: Statistical versus fuzzy measures of variable interaction in patients with stroke. Neuroepidemiology 20(2), 77–84 (2001)
Hotho, A., Nürnberger, A., Paaß, G.: A brief survey of text mining. In: Ldv Forum, vol. 20, pp. 19–62 (2005)
Huang, Z., Dong, W., Duan, H., Li, H.: Similarity measure between patient traces for clinical pathway analysis: problem, method, and applications. IEEE J. Biomed. Health Inform. 18(1), 4–14 (2014)
Jolliffe, I.: Principal Component Analysis. Wiley, New York (2002)
Karami, A.: Fuzzy Topic Modeling for Medical Corpora. Ph.D. Thesis, University of Maryland, Baltimore County (2015)
Karami, A., Gangopadhyay, A.: Fftm: a fuzzy feature transformation method for medical documents. In: Proceedings of the Conference of the Association for Computational Linguistics (ACL), vol. 128 (2014)
Karami, A., Guo, Z.: A fuzzy logic multi-criteria decision framework for selecting it service providers. In: Proceedings of the Hawaii International Conference on System Science (HICSS), pp. 1118–1127. IEEE (2012)
Karami, A., Zhou, B.: Online review spam detection by new linguistic features. In: iConference 2015 Proceedings (2015)
Karami, A., Zhou, L.: Exploiting latent content based features for the detection of static sms spams. In: The 77th Annual Meeting of the Association for Information Science and Technology (ASIST) (2014a)
Karami, A., Zhou, L.: Improving static sms spam detection by using new content-based features. In: The 20th Americas Conference on Information Systems (AMCIS) (2014b)
Karami, A., Yazdani, H.R., Beiryaie, H.S., Hosseinzadeh, N.: A risk based model for is outsourcing vendor selection. In: 2nd IEEE International Conference on Information and Financial Engineering (ICIFE), pp. 250–254. IEEE (2010)
Karami, A., Gangopadhyay, A., Zhou, B., Kharrazi, H.: Flatm: A fuzzy logic approach topic model for medical documents. In: Proceedings of the Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS). IEEE (2015a)
Karami, A., Gangopadhyay, A., Zhou, B., Kharrazi, H.: A fuzzy approach model for uncovering hidden latent semantic structure in medical text collections. In: Proceedings of the iConference (2015b)
Keller, J., Krisnapuram, R., Pal, N.R.: Fuzzy Models and Algorithms for Pattern Recognition and Image Processing, vol. 4. Springer, Berlin (2005)
Kohonen, T.: The self-organizing map. Proc. IEEE 78(9), 1464–1480 (1990)
Kolda, T.G.: Limited-memory matrix methods with applications. http://hdl.handle.net/1903/483 (1998)
Mitchell, T.M.: Machine learning. WCB, McGraw-Hill Boston, MA (1997)
Moon, W.K., Chang, S.-C., Huang, C.-S., Chang, R.-F.: Breast tumor classification using fuzzy clustering for breast elastography. Ultrasound Med. Biol. 37(5), 700–708 (2011)
Naranjo, C.A., Bremner, K.E., Bazoon, M., Turksen, I.B.: Using fuzzy logic to predict response to citalopram in alcohol dependence. Clin. Pharm. Ther. 62(2), 209–224 (1997)
Papageorgiou, E.I., Stylios, C.D., Groumpos, P.P.: An integrated two-level hierarchical system for decision making in radiation therapy based on fuzzy cognitive maps. IEEE Trans. Biomed. Eng. 50(12), 1326–1339 (2003)
Papineni, K.: Why inverse document frequency? In: Proceedings of the Second Meeting of the North American Chapter of the Association for Computational Linguistics on Language Technologies, pp. 1–8. Association for Computational Linguistics (2001)
Pivovarov, R., Perotte, A.J., Grave, E., Angiolillo, J., Wiggins, C.H., Elhadad, N.: Learning probabilistic phenotypes from heterogeneous ehr data. J. Biomed. Inform. 58, 156–165 (2015)
Qi, Y., Bar-Joseph, Z., Klein-Seetharaman, J.: Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins Struct. Funct. Bioinform. 63(3), 490–500 (2006)
Rendón, E., Abundez, I., Arizmendi, A., Quiroz, E.M.: Internal versus external cluster validation indexes. Int. J. Comput. Commun. 5(1), 27–34 (2011)
Sarioglu, E., Choi, H.-A., Yadav, K.: Clinical report classification using natural language processing and topic modeling. In: Proceedings of the IEEE International Conference on Machine Learning and Applications (ICMLA), vol. 2, pp. 204–209 (2012)
Wrenn, J.O., Stein, D.M., Bakken, S., Stetson, P.D.: Quantifying clinical narrative redundancy in an electronic health record. J. Am. Med. Inform. Assoc. 17(1), 49–53 (2010)
Xu, H., Wang, J., Hua, X.-S., Li, S.: Tag refinement by regularized lda. In: Proceedings of the 17th ACM International Conference on Multimedia, pp. 573–576. ACM (2009)
Zadeh, L.A.: Outline of a new approach to the analysis of complex systems and decision processes. IEEE Trans. Syst. Man Cybern. 1, 28–44 (1973)
Zahlmann, G., Kochner, B., Ugi, I., Schuhmann, D., Liesenfeld, B., Wegner, A., Obermaier, M., Mertz, M.: Hybrid fuzzy image processing for situation assessment [diabetic retinopathy]. IEEE Eng. Med. Biol. Mag. 19(1), 76–83 (2000)
Zeng, J., Liu, Z.-Q., Cao, X.-Q.: A new approach to speeding up topic modeling. arXiv preprint arXiv:1204.0170 (2012)
Zimmermann, H.-J.: Fuzzy set theory. Wiley Interdiscip. Rev. Comput. Stat. 2(3), 317–332 (2010)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Karami, A., Gangopadhyay, A., Zhou, B. et al. Fuzzy Approach Topic Discovery in Health and Medical Corpora. Int. J. Fuzzy Syst. 20, 1334–1345 (2018). https://doi.org/10.1007/s40815-017-0327-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40815-017-0327-9