Medical data mining in sentiment analysis based on optimized swarm search feature selection
- 70 Downloads
In this paper, we propose a novel technique termed as optimized swarm search-based feature selection (OS-FS), which is a swarm-type of searching function that selects an ideal subset of features for enhanced classification accuracy. In terms of gaining insights from unstructured medical based texts, sentiment prediction is becoming an increasingly crucial machine learning technique. In fact, due to its robustness and accuracy, it recently gained popularity in the medical industries. Medical text mining is well known as a fundamental data analytic for sentiment prediction. To form a high-dimensional sparse matrix, a popular preprocessing step in text mining is employed to transform medical text strings to word vectors. However, such a sparse matrix poses problems to the induction of accurate sentiment prediction model. The swarm search in our proposed OS-FS can be optimized by a new feature evaluation technique called clustering-by-coefficient-of-variation. In order to find a subset of features from all the original features from the sparse matrix, this type of feature selection has been a commonly utilized dimensionality reduction technique, and has the capability to improve accuracy of the prediction model. We implement this method based on a case scenario where 279 medical articles related to ‘meaningful use functionalities on health care quality, safety, and efficiency’ from a systematic review of previous medical IT literature. For this medical text mining, a multi-class of sentiments, positive, mixed-positive, neutral and negative is recognized from the document contents. Our experimental results demonstrate the superiority of OS-FS over traditional feature selection methods in literature.
KeywordsMedical text mining Optimized swarm search-based feature selection Sentiment prediction Clustering-by-coefficient-of-variation
This paper is supported by the research grant “Temporal Data Stream Mining by Using Incrementally Optimized Very Fast Decision Forest (iOVFDF),” Grant No. MYRG2015-00128-FST, which is offered by the University of Macau, FST, and RDAO.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflicts of interest.
This article does not contain any studies with human participants and animals performed by any of the authors.
- 4.Tsamardinos I, Aliferis CF, Statnikov A (2003) Time and sample efficient discovery of markov blankets and direct causal relations. In Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, ACM Press, pp. 673–678Google Scholar
- 6.Baris S (2008) Fast correlation based filter (FCBF) with a different search strategy. In Proceedings of 23rd international symposium on computer and information sciences, IEEE, Oct. 2008, pp. 1–4Google Scholar
- 7.Hall MA, Smith LA (1999) Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper. In Proceedings of the 12th international florida artificial intelligence research society conference, pp. 235–239Google Scholar
- 9.Fong S, Liang J, Wong R, Ghanavati M (2014) A novel feature selection by clustering coefficients of variations. In: 2014 ninth international conference on digital information management (ICDIM), 29 Sep–1 Oct 2014, pp. 205–213Google Scholar
- 10.Fong S, Liang J, Deb S (2013) Diabetics prediction by using feature selection based on coefficient of variation. In: Proceedings of Wilkes—international conference on computing sciences, New Delhi, November 2013Google Scholar
- 12.Hassanien A-E, Azar T, Snásel A, Kacprzyk V, Abawajy J, J.H. (eds) (2015) Big data in complex systems: challenges and opportunities. Studies in Big Data. Springer, ChamGoogle Scholar
- 14.Platt J (1998) Fast training of support vector machines using sequential minimal optimization. In: Scholkopf B, Burges C, Smola A (eds) Advances in kernel methods: support vector learning. MIT Press, CambridgeGoogle Scholar
- 15.Jacob Eisenstein A, Ahmed, Xing EP (2011) Sparse additive generative models of text. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp. 1041–1048Google Scholar
- 16.Hall MA (1998) Correlation-based feature subset selection for machine learning, PhD thesis, University of Waikato, Hamilton, New ZealandGoogle Scholar
- 17.Liu H, Setiono R (1996) A probabilistic approach to feature selection—a filter solution. In: 13th international conference on machine learning, pp. 319–327Google Scholar
- 20.Bravo Y, Luque G, Alba E (2015) Takeovers time in evolutionary dynamic optimization: from theory to practice. Appl Math Comput 250(1):94–104Google Scholar
- 21.Moraglio A, Di Chio C, Poli R (2007) Geometric Particle Swarm Optimisation. In: Proceedings of the 10th European Conference on Genetic Programming, Berlin, Heidelberg, pp. 125–136Google Scholar
- 23.Fong S, Zhang Y, Fiaidhi J, Mohammed O, Mohammed S (2013) Evaluation of stream mining classifiers for real-time clinical decision support system: a case study of blood glucose prediction in diabetes therapy. Biomed Res Int. https://doi.org/10.1155/2013/274193 CrossRefPubMedPubMedCentralGoogle Scholar