Hypergraph Based Feature Selection Technique for Medical Diagnosis
- 330 Downloads
The impact of internet and information systems across various domains have resulted in substantial generation of multidimensional datasets. The use of data mining and knowledge discovery techniques to extract the original information contained in the multidimensional datasets play a significant role in the exploitation of complete benefit provided by them. The presence of large number of features in the high dimensional datasets incurs high computational cost in terms of computing power and time. Hence, feature selection technique has been commonly used to build robust machine learning models to select a subset of relevant features which projects the maximal information content of the original dataset. In this paper, a novel Rough Set based K – Helly feature selection technique (RSKHT) which hybridize Rough Set Theory (RST) and K – Helly property of hypergraph representation had been designed to identify the optimal feature subset or reduct for medical diagnostic applications. Experiments carried out using the medical datasets from the UCI repository proves the dominance of the RSKHT over other feature selection techniques with respect to the reduct size, classification accuracy and time complexity. The performance of the RSKHT had been validated using WEKA tool, which shows that RSKHT had been computationally attractive and flexible over massive datasets.
KeywordsRough set theory (RST) Hypergraph K – Helly property High dimensional datasets Feature selection Medical diagnosis
The first and the fourth author thank the Department of Science and Technology, India for INSPIRE Fellowship (Grant No: DST/INSPIRE Fellowship/2013/963) and Fund for Improvement of S&T Infrastructure in Universities and Higher Educational Institutions (SR/FST/ETI-349/2013) for their financial support; The second author thanks the Tata Consultancy Services for their financial support; The third author thanks the Department of Science and Technology - Fund for Improvement of S&T Infrastructure in Universities and Higher Educational Institutions Government of India (SR/FST/MSI-107/2015) for their financial support. We would like to express our gratitude towards the unknown potential reviewers who have agreed to review this paper and provided valuable suggestions to improve the quality of the paper.
- 2.Li, H., Feature selection for high-risk pattern discovery in medical data. University of Cincinnati, Dissertation, 2012.Google Scholar
- 4.Pardalos, P.M., Boginski, V.L., and Alkis, V., Data mining in biomedicine. Springer science & business media: New York, 2008.Google Scholar
- 6.Saastamoinen K, Ketola J (2006) Medical data classification using logical similarity based measures. IEEE Conference on Cybernetics and Intelligent Systems. 1–5. doi: 10.1109/ICCIS.2006.252362
- 7.Tsirogiannis, G.L., Frossyniotis, D., Stoitsis, J., Golemati, S., Stafylopatis, A., and Nikita, K.S., Classification of medical data with a robust multi-level combination scheme. IEEE International Joint Conference on Neural Networks. 3:2483–2487, 2004. doi: 10.1109/IJCNN.2004.1381020.Google Scholar
- 14.Cao, B., Shen, D., Sun, J.T., Yang, Q., and Chen, Z., Feature selection in a kernel space. Proceedings of the 24th international conference on. Mach. Learn. 121–128, 2007. doi: 10.1145/1273496.1273512.
- 16.Prasad M, Sowmya A, Koch I (2004) Efficient feature selection based on independent component analysis. Intelligent Sensors, Sensor Networks and Information Processing Conference. 427–432. doi: 10.1109/ISSNIP.2004.1417499
- 26.Wroblewski, J., Finding minimal reducts using genetic algorithms. Second Annual Join Conference on Information Science. 186–189, 1995.Google Scholar
- 27.Jensen, R., and Shen, Q., Finding rough set reducts with ant colony optimization. UK workshop on computational intelligence. 1, 2003.Google Scholar
- 30.Peters J F, Ramanna S (2008) Feature selection: Near set approach. In: Mining complex data. Springer: Berlin Heidelberg, pp 57–71.Google Scholar
- 33.Hu X, Cercone N, Han J (1994) An attribute-oriented rough set approach for knowledge discovery in databases. In: Rough sets, fuzzy sets and knowledge discovery. Springer, pp 90–99.Google Scholar
- 34.Hu, K., Lu, Y., and Shi, C., Feature ranking in rough sets. AI Commun. 16:41–50, 2003.Google Scholar
- 36.Slezak, D., Approximate entropy reducts. Fundamenta informaticae. 53:365–390, 2002.Google Scholar
- 39.Lassez J L, Rossi R, Sheel S, Mukkamala S (2008) Signature based intrusion detection using latent semantic analysis. IEEE International Joint Conference on Neural Networks. 1068–1074. doi: 10.1109/IJCNN.2008.4633931
- 40.Nguyen H, Franke K, Petrović S (2010) Improving effectiveness of intrusion detection by correlation feature selection. ARES’10 International Conference on Availability, Reliability, and Security. 17–24. doi: 10.1109/ARES.2010.70
- 45.Zhang M, Yao J T (2004) A rough sets based approach to feature selection. In, 2004. IEEE annual meeting of the fuzzy information processing NAFIPS'04. 1: 434–439. doi: 10.1109/NAFIPS.2004.1336322
- 47.Pawlak, Z., Rough sets: theoretical aspects of reasoning about data. Springer science and business media, B V, 2012.Google Scholar
- 49.Deo, N., Graph theory with applications to engineering and computer science. Dover publications: New York, 2016.Google Scholar
- 50.Berge, C., Graphs and hypergraphs. North-Holland publishing company, Amsterdam, 1973.Google Scholar
- 52.Bretto A, Gillibert L (2005) Hypergraph-based image representation. In: Graph-based representations in pattern recognition. Springer: Berlin Heidelberg, pp 1–11Google Scholar
- 54.Anaraki, J.R., Eftekhari M (2011) Improving fuzzy-rough quick reduct for feature selection. 19th Iranian conference on electrical engineering. 1–6Google Scholar
- 55.UCI Repository (2016), http://archive.ics.uci.edu/ml/. Accessed 22 Jun 2016.
- 56.Witten, I.H., and Frank, E., Data Mining: practical machine learning tools and techniques with java implementations. Morgan Kaufmann, San Francisco, 2000.Google Scholar
- 57.Wang, G.Y., Yu, H., and Yang, D.C., Decision table reduction based on conditional information entropy. Chinese journal of computers - chinese edition. 25:759–766, 2002.Google Scholar