Advertisement

Neural Computing and Applications

, Volume 26, Issue 8, pp 1859–1880 | Cite as

A novel hybrid feature selection method based on rough set and improved harmony search

  • H. Hannah Inbarani
  • M. Bagyamathi
  • Ahmad Taher AzarEmail author
Original Article

Abstract

Feature selection is a process of selecting optimal features that produce the most prognostic outcome. It is one of the essential steps in knowledge discovery. The crisis is that not all features are important. Most of the features may be redundant, and the rest may be irrelevant and noisy. This paper presents a novel feature selection approach to deal with issues of high dimensionality in the medical dataset. Medical datasets are habitually classified by a large number of measurements and a comparatively small number of patient records. Most of these measurements are irrelevant or noisy. This paper proposes a supervised feature selection method based on Rough Set Quick Reduct hybridized with Improved Harmony Search algorithm. Rough set theory is one of the most thriving methods used for feature selection. The Rough Set Improved Harmony Search Quick Reduct (RS-IHS-QR) algorithm is a relatively new population-based meta-heuristic optimization algorithm. This approach imitates the music improvisation process, where each musician improvises their instrument’s pitch by searching for a perfect state of harmony. The quality of the reduced data is measured by the classification performance. The proposed algorithm is experimentally compared with the existing algorithms Rough Set Quick Reduct (RS-QR) and Rough Set Particle Swarm Optimization Quick Reduct (RS-PSO-QR). The number of features selected by the proposed method is comparatively low. The proposed algorithm reveals more than 90 % classification accuracy in most of the cases and the time taken to reduct the dataset also decreased than the existing methods. The experimental result demonstrates the efficiency and effectiveness of the proposed algorithm.

Keywords

Feature selection Rough sets Quick reduct Particle swarm optimization Improved harmony search 

References

  1. 1.
    Abdel-AalM RE (2005) GMDH-based feature ranking and selection for improved classification of medical data. J Biomed Inform 38(6):456–468CrossRefGoogle Scholar
  2. 2.
    Aghdam MH, Ghasem-Aghaee N, Basiri ME (2008) Application of ant colony optimization for feature selection in text categorization. In: Proceedings of the IEEE congress on evolutionary computation (CEC ‘08), Hong Kong, pp. 2867–2873Google Scholar
  3. 3.
    Al-Ani A, Khushaba RN (2012) A population based feature subset selection algorithm guided by fuzzy feature dependency. In: Proceedings of advanced machine learning technologies and applications (AMLTA 2012), December 8-10, Cairo, Egypt, 322(1):430–438Google Scholar
  4. 4.
    Al-Betar M, Khader A, Liao I (2010) A harmony search with multi-pitch adjusting rate for the university course timetabling. In Geem Z (ed) Recent advances in Harmony search algorithm. Springer, Berlin, vol 270, pp 147–161Google Scholar
  5. 5.
    Alia OM, Mandava R (2011) The variants of the harmony search algorithm: an overview. Artif Intell Rev 36(1):49–68CrossRefGoogle Scholar
  6. 6.
    Alpigini JJ, Peters JF, Skowronek J, Zhong N (eds) (2002) Rough sets and current trends in computing. In: Proceedings of third international conference, RSCTC 2002, Malvern, PA, USA, October 14-16,. LNAI 2475, Springer. ISBN 3-540-44274-XGoogle Scholar
  7. 7.
    Anaraki JR, Eftekhari M (2013) Rough set based feature selection: a review. Fifth conference on information and knowledge technology (IKT), 28-30 May 2013, 301–306. IEEE. doi: 10.1109/IKT.2013.6620083
  8. 8.
    Asad AH, Azar AT, Hassanien AE (2014) A comparative study on feature selection for retinal vessel segmentation using ant colony system. Recent Adv Intell Inform Adv Intell Syst Comput 235:1–11. doi: 10.1007/978-3-319-01778-5_1 CrossRefGoogle Scholar
  9. 9.
    Azar AT (2014) Neuro-fuzzy feature selection approach based on linguistic hedges for medical diagnosis. Int J Model Identif Control 22(3):195–206. doi: 10.1504/IJMIC.2014.065338 CrossRefGoogle Scholar
  10. 10.
    Azar AT, Hassanien AE (2014) Dimensionality reduction of medical big data using neural-fuzzy classifier. Soft computing, pp 1–13, Springer. doi: 10.1007/s00500-014-1327-4
  11. 11.
    Azar AT, Banu PKN, Inbarani HH (2013a) PSORR: an unsupervised feature selection technique for fetal heart rate. In: 5th International conference on modelling, identification and control (ICMIC 2013), Egypt, 31 August, 1–2 September 2013, pp 60–65Google Scholar
  12. 12.
    Azar AT, El-Said SA (2013) Superior neuro-fuzzy classification systems. Neural Comput Appl 23(1):55–72. doi: 10.1007/s00521-012-1231-8 CrossRefGoogle Scholar
  13. 13.
    Azar AT, El-Said SA, Balas VE, Olariu T (2013b) Linguistic hedges fuzzy feature selection for erythemato-squamous diseases. In: Soft computing applications, advances in intelligent systems and computing (AISC), vol 195. Springer, Berlin, pp 487–500. doi: 10.1007/978-3-642-33941-7_43
  14. 14.
    Aziz ASA, Hassanien AE, Azar AT, Hanafy SE (2013) Genetic algorithm with different feature selection techniques for anomaly detectors generation. Federated conference on computer science and information systems Kraków, Poland, pp 769–774Google Scholar
  15. 15.
    Bagyamathi M, Inbarani HH (2015) A novel hybridized rough set and improved harmony search based feature selection for protein sequence classification. In: Hassanien AE, Azar AT, Snasel V, Kacprzyk J, Abawajy JH (eds) Big data in complex systems: challenges and opportunities, studies in big data, vol 9. Springer, Berlin, pp 173–204Google Scholar
  16. 16.
    Banu PKN, Inbarani HH, Azar AT, Hala S, Own HS, Hassanien AE (2014) Rough set based feature selection for egyptian neonatal jaundice. In: Hassanien AE, Tolba M, Azar AT (eds) Advanced machine learning technologies and applications: second international conference, AMLTA 2014, Cairo, Egypt, November 28–30, 2014. Proceedings, communications in computer and information science, vol 488. Springer, Berlin. ISBN: 978-3-319-13460-4Google Scholar
  17. 17.
    Basiri ME, Ghasem-Aghaee N, Aghdam MH (2008) Using ant colony optimization-based selected features for predicting post-synaptic activity in proteins. In: Proceedings of 6th European conference on EvoBio 2008, 6th European conference, EvoBIO 2008, Naples, Italy, 4973: 12–23Google Scholar
  18. 18.
    Beniwal S, Arora J (2012) Classification and feature selection techniques in data mining. Int J Eng Res Technol 1(6):2278–2284Google Scholar
  19. 19.
    Blake CL, Merz CJ (2013) UCI repository of machine learning databases. http://www.ics.uci.edu/∼mlearn. Accessed Sept 2013
  20. 20.
    Chakraborty P, Roy GG, Das S, Jain D, Abraham A (2009) An improved harmony search algorithm with differential mutation operator. Fundam Inform 95(4):1–26MathSciNetGoogle Scholar
  21. 21.
    Chandrasekhar T, Thangavel K, Sathishkumar EN (2012) Verdict accuracy of quick reduct algorithm using clustering and classification techniques for gene expression data. IJCSI Int J Comput Sci Issues 9(1):357–363Google Scholar
  22. 22.
    Chen Y, Miao D, Wang R (2010) A rough set approach to feature selection based on ant colony optimization. Pattern Recogn Lett 31(3):226–233CrossRefGoogle Scholar
  23. 23.
    Chen HL, Yang B, Liu J, Liu DY (2011) A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis. Expert Syst Appl 38(7):9014–9022CrossRefGoogle Scholar
  24. 24.
    Chen LF, Su CT, Chen KH, Wang PC (2012) Particle swarm optimization for feature selection with application in obstructive sleep apnea diagnosis. Int J Neural Comput Appl 21(8):2087–2096MathSciNetCrossRefGoogle Scholar
  25. 25.
    Chouchoulas A, Shen Q (2001) Rough set-aided keyword reduction for text categorization. Int J Appl Artif Intell 15(9):843–873CrossRefGoogle Scholar
  26. 26.
    Degertekin SO (2008) Optimum design of steel frames using harmony search algorithm. Struct Multidiscipl Optim 36(4):393–401CrossRefGoogle Scholar
  27. 27.
    Elshazly HI, Azar AT, Elkorany AM, Hassanien AE (2013) Hybrid system based on rough sets and genetic algorithms for medical data classifications. Int J Fuzzy Syst Appl (IJFSA) 3(4):31–46CrossRefGoogle Scholar
  28. 28.
    Forsati R, Moayedikia A, Jensen R, Shamsfard M, Meybodi MR (2014) Enriched ant colony optimization and its application in feature selection. Neurocomputing 142:354–371CrossRefGoogle Scholar
  29. 29.
    Fu X, Tan F, Wang H, Zhang YQ, Harrison RR (2006) Feature similarity based redundancy reduction for gene selection. In: Proceedings of the international conference on data mining, June 26–29, Las Vegas, NV, pp 357–360Google Scholar
  30. 30.
    Geem ZW, Kim JH, Loganathan GV (2001) A new heuristic optimization algorithm: harmony search. Simulation 76(2):60–68CrossRefGoogle Scholar
  31. 31.
    Geem ZW (2006) Improved harmony search from ensemble of music players. In: Proceedings of 10th international conference on knowledge-based intelligent information and engineering systems–KES 2006. LNCS 4251. Springer, Heidelberg, pp 86–93Google Scholar
  32. 32.
    Geem ZW, Choi JY (2007) Music composition using harmony search algorithm. Appl Evol Comput LNCS 4448:593–600Google Scholar
  33. 33.
    Geem ZW (2009) Particle-swarm harmony search for water network design. Eng Optim 41(4):297–311CrossRefGoogle Scholar
  34. 34.
    Gu Q, Ding Y, Jiang X, Zhang T (2010) Prediction of subcellular location apoptosis proteins with ensemble classifier and feature selection. Amino Acids 38(4):975–983CrossRefGoogle Scholar
  35. 35.
    Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor Newslett 11(1):10–18CrossRefGoogle Scholar
  36. 36.
    Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann Publishers, Waltham. ISBN 978-0-12-381479-1Google Scholar
  37. 37.
    Hassanien AE, Azar AT, Snasel V, Kacprzyk J, Abawajy JH (2015) Big data in complex systems: challenges and opportunities, studies in big data, vol 9. Springer, Berlin. ISBN 978-3-319-11055-4CrossRefGoogle Scholar
  38. 38.
    Hu QH, Yu DR, Xie ZX (2006) Information-preserving hybrid data reduction based on fuzzy-rough techniques. Pattern Recogn Lett 27(5):414–423CrossRefGoogle Scholar
  39. 39.
    Hassanien AE, Tolba M, Azar AT (2014) Advanced machine learning technologies and applications: second international conference, AMLTA 2014, Cairo, Egypt, November 28–30, 2014. In: Proceedings, communications in computer and information science, vol 488. Springer, Berlin. ISBN: 978-3-319-13460-4Google Scholar
  40. 40.
    Huang J, Cai Y, Xu X (2007) A hybrid genetic algorithm for feature selection wrapper based on mutual information. Pattern Recogn Lett 28(13):1825–1844CrossRefGoogle Scholar
  41. 41.
    Huang SH, Wulsin LR, Li H, Guo J (2009) Dimensionality reduction for knowledge discovery in medical claims database: application to antidepressant medication utilization study. Comput Methods Programs Biomed 93(2):115–123CrossRefGoogle Scholar
  42. 42.
    Huang ML, Hung YH, Chen WY (2010) Neural network classifier with entropy based feature selection on breast cancer diagnosis. J Med Syst 34(5):865–873CrossRefGoogle Scholar
  43. 43.
    Inbarani HH, Banu PKN, Andrews S (2012) Unsupervised hybrid PSO–quick reduct approach for feature reduction. In: Proceedings of international conference on recent trends in information technology–ICRTIT 2012. pp 11–16Google Scholar
  44. 44.
    Inbarani HH, Banu PKN (2012) Unsupervised hybrid PSO: relative reduct approach for feature reduction. In: Proceedings of international conference on pattern recognition, informatics and medical engineering, March 21–23, Salem, Tamil Nadu, India, pp 103–108Google Scholar
  45. 45.
    Inbarani HH, Azar AT, Jothi G (2014a) Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis. Comput Methods Programs Biomed 113(1):175–185CrossRefGoogle Scholar
  46. 46.
    Inbarani HH, Banu PKN, Azar AT (2014b) Feature selection using swarm-based relative reduct technique for fetal heart rate. Neural Comput Appl 25(3–4):793–806CrossRefGoogle Scholar
  47. 47.
    Inbarani HH, Kumar SS, Azar AT, Hassanien AE (2014c) Soft rough sets for heart valve disease diagnosis. In: AE Hassanien, M Tolba, AT Azar (eds) Advanced machine learning technologies and applications: second international conference, AMLTA 2014, Cairo, Egypt, November 28–30, 2014. Proceedings, communications in computer and information science, vol 488. Springer, Berlin. ISBN: 978-3-319-13460-4Google Scholar
  48. 48.
    Jensen R, Shen Q (2004) Semantics-preserving dimensionality reduction: rough and fuzzy-rough based approaches. IEEE Trans Knowl Data Eng 16(12):1457–1471CrossRefGoogle Scholar
  49. 49.
    Jensen R (2005) Combining rough and fuzzy sets for feature selection, doctor of philosophy, Ph. D Dissertation, School of Informatics University of EdinburghGoogle Scholar
  50. 50.
    Jiang J, Bo Y, Song C, Bao L (2012) Hybrid algorithm based on particle swarm optimization and artificial fish swarm algorithm. Adv Neural Netw 7367:607–614Google Scholar
  51. 51.
    Jothi G, Inbarani HH, Azar AT (2013) Hybrid tolerance-PSO based supervised feature selection for digital mammogram images. Int J Fuzzy Syst Appl (IJFSA) 3(4):15–30CrossRefGoogle Scholar
  52. 52.
    Jothi G, Inbarani HH (2012) Soft set based quick reduct approach for unsupervised feature selection. In: Proceedings of international conference on advanced communication control and computing technologies (ICACCCT), Tamil Nadu, India, IEEE. pp 277–281Google Scholar
  53. 53.
    Kalyani P, Karnan M (2011) A new implementation of Attribute reduction using Quick Relative Reduct algorithm. Int J Internet Comput 1(1):99–102Google Scholar
  54. 54.
    Kattan A, Abdullah R, Salam RA (2010) Harmony search based supervised training of artificial neural networks. In: International conference on intelligent systems, modelling and simulation, IEEE. pp 105–110Google Scholar
  55. 55.
    Kennedy J, Eberhart RC (1995) A new optimizer using particle swarm theory. In: Proceedings of sixth international symposium on micro machine and human science, Nagoya vol 1, pp 39–43Google Scholar
  56. 56.
    Lee CK, Lee GG (2006) Information gain and divergence-based feature selection for machine learning-based text categorization. Inf Process Manage 42(1):155–165CrossRefGoogle Scholar
  57. 57.
    Liu H, Motoda H (2007) Computational methods of feature selection, Chapman and Hall/CRC Press, USA. ISBN-13: 978-1584888789Google Scholar
  58. 58.
    Long NC, Cong N, Meesad P, Unger H (2014) Attribute reduction based on rough sets and the discrete firefly algorithm. Recent Adv Inform Commun Technol 265:13–22CrossRefGoogle Scholar
  59. 59.
    Macas M, Lhotsk L, Bakstein E, Novák D, Wild J, Sieger T, Vostatek P, Jech R (2012) Wrapper feature selection for small sample size data driven by complete error estimates. Comput Methods Programs Biomed 108(1):138–150CrossRefGoogle Scholar
  60. 60.
    Mahdavi M, Fesanghary M, Damangir E (2007) An improved harmony search algorithm for solving optimization problems. Appl Math Comput 188(2):1567–1579zbMATHMathSciNetCrossRefGoogle Scholar
  61. 61.
    Mitra P, Murthy CA, Pal SK (2002) Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 24(3):301–312CrossRefGoogle Scholar
  62. 62.
    Navi SP (2013) Using harmony clustering for haplotype reconstruction from SNP fragments. Int J Bio-Sci Bio-Technol 5(5):223–232CrossRefGoogle Scholar
  63. 63.
    Nemati S, Boostani R, Jazi MD (2008) A novel text-independent speaker verification system using ant colony optimization algorithm. ICISP2008, LNCS 5099. Springer, Berlin, pp 421–429Google Scholar
  64. 64.
    Olson DL, Delen D (2008) Advanced data mining techniques, first edition, Springer, ISBN 3-540-76916-1Google Scholar
  65. 65.
    Pawlak Z (2002) Rough sets and intelligent data analysis. Inf Sci 147(1–4):1–12zbMATHMathSciNetCrossRefGoogle Scholar
  66. 66.
    Pawlak Z (1993) Rough sets: present state and the future. Found Comput Decis Sci 18(3–4):157–166zbMATHMathSciNetGoogle Scholar
  67. 67.
    Peng YH, Wu Z, Jiang J (2010) A novel feature selection approach for biomedical data classification. J Biomed Inform 43(1):15–23CrossRefGoogle Scholar
  68. 68.
    Rami NK, Al-Ani A, Al-Jumaily A (2011) Feature subset selection using differential evolution and a statistical repair mechanism. Expert Syst Appl 38(9):11515–11526CrossRefGoogle Scholar
  69. 69.
    Saeys Y, Inza IN, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 23(19):2507–2517CrossRefGoogle Scholar
  70. 70.
    Seok LK, Geem ZW (2005) A new meta-heuristic algorithm for continuous engineering optimization: harmony search theory and practice. Comput Methods Appl Mech Eng 194(36–38):3902–3933zbMATHGoogle Scholar
  71. 71.
    Sivagaminathan RK, Ramakrishnan S (2007) A hybrid approach for feature subset selection using neural networks and ant colony optimization. Expert Syst Appl 33(1):49–60CrossRefGoogle Scholar
  72. 72.
    Shi Y, Eberhart RC (1998) Parameter selection in particle swarm optimization. In: Proceedings of the seventh annual conference on evolutionary programming. Springer, New York, vol 1447, pp 591–600Google Scholar
  73. 73.
    Suguna N, Thanushkodi K (2010) A novel rough set reduct algorithm for medical domain based on bee colony optimization. J Comput 2(6):49–54Google Scholar
  74. 74.
    Swiniarski RW, Skowron A (2003) Rough set methods in feature selection and recognition. Pattern Recogn Lett 24(6):833–849zbMATHCrossRefGoogle Scholar
  75. 75.
    Velayutham C, Thangavel K (2011) Unsupervised quick reduct algorithm using rough set theory. J Electron Sci Technol 9(3):193–201Google Scholar
  76. 76.
    Wang B, Gao K, Zhang B (2005) Algorithm of feature selection for inconsistent data preprocessing based rough set. Int J Inform Syst Sci 1(3–4):311–319zbMATHGoogle Scholar
  77. 77.
    Wang F, Dang C, Qian Y (2012) An efficient rough feature selection algorithm with a multi-granulation view. Int J Approx Reason 53(6):912–926MathSciNetCrossRefGoogle Scholar
  78. 78.
    Wang F, Xu J, Li L (2014) A novel rough set reduct algorithm to feature selection based on artificial fish swarm algorithm. Adv Swarm Intell 8795:24–33Google Scholar
  79. 79.
    Wang J, Peng XY, Peng Y (2007) Efficient rough-set based attribute reduction algorithm with nearest neighbour searching. Electron Lett 43(10):563–565MathSciNetCrossRefGoogle Scholar
  80. 80.
    Wang X, Yang J, Teng X, Xia W, Jensen R (2007) Feature selection based on rough sets and particle swarm optimization. Pattern Recogn Lett 28(4):459–471CrossRefGoogle Scholar
  81. 81.
    Zhang G, Hu L, Jin W (2005) Discretization of continuous attributes in rough set theory and its application. Comput Inform Sci Lecture Notes Comput Sci 3314:1020–1026CrossRefGoogle Scholar

Copyright information

© The Natural Computing Applications Forum 2015

Authors and Affiliations

  • H. Hannah Inbarani
    • 1
  • M. Bagyamathi
    • 2
  • Ahmad Taher Azar
    • 3
    Email author
  1. 1.Department of Computer SciencePeriyar UniversitySalemIndia
  2. 2.Department of Computer ScienceGonzaga College of Arts and Science for WomenKrishnagiriIndia
  3. 3.Faculty of Computers and InformationBenha UniversityBanhaEgypt

Personalised recommendations