Enhancing the Performance of LibSVM Classifier by Kernel F-Score Feature Selection

  • Balakrishnan Sarojini
  • Narayanasamy Ramaraj
  • Savarimuthu Nickolas
Part of the Communications in Computer and Information Science book series (CCIS, volume 40)

Abstract

Medical Data mining is the search for relationships and patterns within the medical datasets that could provide useful knowledge for effective clinical decisions. The inclusion of irrelevant, redundant and noisy features in the process model results in poor predictive accuracy. Much research work in data mining has gone into improving the predictive accuracy of the classifiers by applying the techniques of feature selection. Feature selection in medical data mining is appreciable as the diagnosis of the disease could be done in this patient-care activity with minimum number of significant features. The objective of this work is to show that selecting the more significant features would improve the performance of the classifier. We empirically evaluate the classification effectiveness of LibSVM classifier on the reduced feature subset of diabetes dataset. The evaluations suggest that the feature subset selected improves the predictive accuracy of the classifier and reduce false negatives and false positives.

Keywords

Medical data mining Feature selection predictive accuracy false negative false positive LibSVM classifier 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Burke, H.B., Goodman, P.H., Rosen, D.B., Henson, D.E., Weinstein, J.N., Harrell Jr., F.E., Marks, J.R., Winchester, D.P., Bostwick, D.G.: Artificial neural networks improve the accuracy of cancer survival prediction. Cancer 79, 857–862 (1997)CrossRefPubMedGoogle Scholar
  2. 2.
    Lavrac, N.: Selected techniques for data mining in medicine. Artif. Intell. Med. 16, 3–23 (1999)CrossRefPubMedGoogle Scholar
  3. 3.
    Cios, K.J., Moore, G.: Uniqueness of medical data mining. Artif. Intell. Med. 26, 1–24 (2002)CrossRefPubMedGoogle Scholar
  4. 4.
    Liu, Motoda, H.: Feature Extraction, Construction and Selection. In: A Data Mining Perspective. Kluwer Academic Publishers, Boston (1998); 2nd Printing (2001)Google Scholar
  5. 5.
    Split, A.M.T., Stegwee, R.A., Teitink, J.A.C.: Business intelligent for healthcare organizations. In: Proceeding of the 35th Annual Hawaii International Conference on System Sciences. IEEE Press, New York (2002)Google Scholar
  6. 6.
    Abraham, R., Simha, J.B., Iyengar, S.: Medical datamining with a new algorithm for feature selection and Naïve Bayesian classifier. In: 10th International Conference on Information TechnologyGoogle Scholar
  7. 7.
    Chang, C.-C., Lin, C.-J.: LIBSVM a library for support vector machines (2005), http://www.csie.ntu.edu.tw/~cjlin/libsvm
  8. 8.
    Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
  9. 9.
    Chen, Y.-W., Lin, C.-J.: Combining SVMs with various feature selection strategies (2005), http://www.csie.ntu.edu.tw/~cjlin/papers/features.pdf
  10. 10.
    Bradley, P.S., Mangasarian, O.L.: Feature selection via concave minimization and support vector machines. In: Proceedings of the 13th international conference on machine learning, San Francisco, CA, pp. 82–90 (1998)Google Scholar
  11. 11.
    Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.N.: Feature selection for SVMs. In: Leen, T., Dietterich, T., Tresp, V. (eds.) Advances in Neural Information Processing Systems 13, pp. 668–674 (2001)Google Scholar
  12. 12.
    Guyon, I., Weston, J., Barnhill, S., Bapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46(1–3), 389–422 (2002)CrossRefGoogle Scholar
  13. 13.
    Duan, K., Keerthi, S.S., Poo, A.N.: Evaluation of simple performance measures for tuning SVM hyperparameters. Neurocomputing 51, 41–59 (2003)CrossRefGoogle Scholar
  14. 14.
    Computer aided diagnosis, data mining combine for improved care, Health care IT (2006)Google Scholar
  15. 15.
    Predicting Health: Jeff Kaplan, managing director at Apollo Data Technologies LLC in ChicagoGoogle Scholar
  16. 16.
    Roshawnna Scales, Mark Embrechts: Computational intelligence techniques for medical diagnosticsGoogle Scholar
  17. 17.
    Kononeko, I., Kukar, M.: Machine learning for medical diagnosis. In: Workshop on Computer-Aided Data Analysis in Medicine, CADAM 1995. IJS Scientific Publishing, Ljubljana (1995)Google Scholar
  18. 18.
    Delen*, D., Walker, G., Kadam, A.: Predicting breast cancer survivability: a comparison of three data mining methods. Artificial Intelligence in Medicine, doi:10.1016/j.artmed.2004.07.002Google Scholar
  19. 19.
    Hedberg, S.R.: The data gold rush. Byte, 83–88 (October 1995)Google Scholar
  20. 20.
    Magoulas, G.D., Prentza, A.: Machine learning in medical applicationsGoogle Scholar
  21. 21.
    Lee, S.J., Siau, K.: A review of data mining techniques. Industrial Management and Data Systems 101(1), 41–46 (2001)CrossRefGoogle Scholar
  22. 22.
    Pyle, D.: Data Preparation for Data Mining. Morgan Kaufmann Publishers, San Francisco (1999)Google Scholar
  23. 23.
    Boser, B., Guyon, I., Vapnik, V.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)Google Scholar
  24. 24.
    Weston, J., Mukherjee, S., Chapelle, O., Pontil, M., Poggio, T., Vapnik, V.N.: Feature selection for SVMs. In: Leen, T., Dietterich, T., Tresp, V. (eds.) Advances in Neural Information Processing Systems 13, pp. 668–674 (2001)Google Scholar
  25. 25.
    Molina, L.C., Belanche, L., Nebot, A.: Attribute Selection Algorithms: A survey and experimental evaluation. In: Proceedings of 2nd IEEE’s KDD 2002, pp. 306–313 (2002)Google Scholar
  26. 26.
    Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003)Google Scholar
  27. 27.
    Koller, D., Sahami, M.: Towards optimal feature selection. In: 13th International Conference on Machine Learning, Bari, Italy, pp. 284–292 (1996)Google Scholar
  28. 28.
    Richards, G., Rayward-Smith, V.J., Sonksen, P.H., Carey, S., Weng, C.: Data mining for indicators of early mortality in a database of clinical records. Artif. Intell. Med. 22, 215–231 (2001)CrossRefPubMedGoogle Scholar
  29. 29.
    Siedlecki, W., Sklansky, J.: On automatic feature selection. International Journal of Pattern Recognition and Artificial Intelligence 2(2), 197–220 (1988)CrossRefGoogle Scholar
  30. 30.
    Kohavi, R., John, G.: Wrapper for Feature Subset Selection. Artificial Intelligence 97(1-2), 273–324 (1997)CrossRefGoogle Scholar
  31. 31.
    Almuallim, H., Dietterich, T.G.: Efficient algorithms for identifying relevant features. In: Proceedings of the Ninth Canadian Conference on Artificial Intelligence. Morgan Kaufmann, Vancouver (1992)Google Scholar
  32. 32.
    Kohavi, R., John, G.: Wrapper for Feature Subset Selection. Artificial Intelligence 97(1-2), 273–324 (1997)CrossRefGoogle Scholar
  33. 33.
    Langley, P.: Selection of Relevant Features in Machine Learning. In: Proc. AAAI Fall Symp. Relevance (1994)Google Scholar
  34. 34.
    Liu, H., Yu, L.: Feature Selection for Data MiningGoogle Scholar
  35. 35.
    Vapnik, V.: Statistical Learning Theory. Wiley, New York (1998)Google Scholar
  36. 36.
    Evgeniou, T., Pontil, M., Papageorgiou, C., Poggio, T.: Image representations for object detection using kernel classifiers. In: Asian Conference on Computer Vision (2000)Google Scholar
  37. 37.
    Mukherjee, S., Tamayo, P., Slonim, D., Verri, A., Golub, T., Mesirov, J., Poggio, T.: Support vector machine classification of microarray data. AI Memo 1677, Massachusetts Institute of Technology (1999)Google Scholar
  38. 38.
    Furey, T.S., Cristianini, N., Duffy, N., Bednarski, D.W., Schummer, M., Haussler, D.: Support vector machine classification and validation of cancer tissue samples using Microarray expression data. Bioinformatics 16, 906–914 (2000)CrossRefPubMedGoogle Scholar
  39. 39.
    Takeuchi, K., Collier, N.: Bio-medical entity extraction using support vector machines. Artif. Intell. Med. 33, 125–137 (2005)CrossRefPubMedGoogle Scholar
  40. 40.
    Cohen, G., Hilario, M., Sax, H., Hugonnet, S., Geissbuhler, A.: Learning from imbalanced data in surveillance of nosocomial infection. Artif. Intell. Med. 37, 7–18 (2006)CrossRefPubMedGoogle Scholar
  41. 41.
    Ali, S., Smith, K.A.: Automatic parameter selection for polynomial kernel. In: Proc. of the IEEE Int. Conf. on Information Reuse and Integration (IRI 2003), Las Vegas, NV, USA, October 27–29, pp. 243–249 (2003)Google Scholar
  42. 42.
    Imbault, F., Lebart, K.: A stochastic optimization approach for parameter tuning of support vector machines. In: Proc. of the 17th Int. Conf. on Pattern Recognition (ICPR 2004), Cambridge, UK, vol. 4, pp. 597–600 (2004)Google Scholar
  43. 43.
    Schittkowski, K.: Optimal parameter selection in support vector machines. Journal of Industrial and Management Optimization 1(4), 465–476 (2005)CrossRefGoogle Scholar
  44. 44.
    John, G.H., Kohavi, R., Pfleger, K.: Irrelevant feature and the subset selection problem. In: asnd Hirsh H. Cohen, W.W. (ed.) Machine Learning: Proceedings of the Eleventh International Conference, New Brunswick, N.J., pp. 121–129. Rutgers University (1994)Google Scholar
  45. 45.
    Herron: Machine Learning for Medical Decision Support: Evaluating Diagnostic Performance of Machine Learning classification AlgorithmsGoogle Scholar
  46. 46.
    Huang, J., Ling, C.X.: Using AUC and Accuracy in Evaluating Learning Algorithms. IEEE Transactions on Knowledge and Data Engineering 17(3) (2005)Google Scholar
  47. 47.
    Huang, C.-L., Liao, H.-C., Chen, M.-C.: Prediction model building and feature selection with support vector machines in breast cancer diagnosis. Expert Systems with Applications, 578–587 (2008), doi:10.1016/j.eswa.2006.09.041Google Scholar
  48. 48.
    EL-Manzalawy, Y., Honavar, V.: WLSVM: Integrating LibSVM into Weka Environment (2005), http://www.cs.iastate.edu/~yasser/wlsvm

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Balakrishnan Sarojini
    • 1
    • 2
  • Narayanasamy Ramaraj
    • 3
  • Savarimuthu Nickolas
    • 4
  1. 1.PhD Research Scholar, Department of Computer ScienceMother Teresa Women’s UniversityKodaikanalIndia
  2. 2.Working as Professor, K.L.N. College of Information TechnologyMaduraiIndia
  3. 3.Principal, G.K.M. College of Engineering & TechnologyChennaiIndia
  4. 4.Assistant Professor, Department of Computer ApplicationsNational Institute of TechnologyTiruchirappalliIndia

Personalised recommendations