Skip to main content

Guidelines to Select Machine Learning Scheme for Classification of Biomedical Datasets

  • Conference paper
Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics (EvoBIO 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5483))

Abstract

Biomedical datasets pose a unique challenge to machine learning and data mining algorithms for classification because of their high dimensionality, multiple classes, noisy data and missing values. This paper provides a comprehensive evaluation of a set of diverse machine learning schemes on a number of biomedical datasets. To this end, we follow a four step evaluation methodology: (1) pre-processing the datasets to remove any redundancy, (2) classification of the datasets using six different machine learning algorithms; Naive Bayes (probabilistic), multi-layer perceptron (neural network), SMO (support vector machine), IBk (instance based learner), J48 (decision tree) and RIPPER (rule-based induction), (3) bagging and boosting each algorithm, and (4) combining the best version of each of the base classifiers to make a team of classifiers with stacking and voting techniques. Using this methodology, we have performed experiments on 31 different biomedical datasets. To the best of our knowledge, this is the first study in which such a diverse set of machine learning algorithms are evaluated on so many biomedical datasets. The important outcome of our extensive study is a set of promising guidelines which will help researchers in choosing the best classification scheme for a particular nature of biomedical dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wasan, S., Bhatnagar, V., Kaur, H.: The impact of data mining techniques on medical diagnostics. Data Science Journal 5, 119–126 (2006)

    Article  Google Scholar 

  2. Pena-Reyes, C.A., Sipper, M.: Evolutionary computation in medicine: an overview. Journal of Artificial Intelligence in Medicine 19(1), 1–23 (2000)

    Article  Google Scholar 

  3. Janecek, A.G.K., Gansterer, W.N., Demel, M.A., Ecker, G.F.: On the relationship between feature selection and classification accuracy. Journal of Machine Learning and Research 4, 90–105 (2008)

    Google Scholar 

  4. Opitz, D., Maclin, R.: Popular ensemble methods: an empirical study. Journal of Artificial Intelligence Research 11, 169–198 (1999)

    MATH  Google Scholar 

  5. Assareh, A., Moradi, M.H., Volkert, L.G.: A hybrid random subspace classifier fusion approach for protein mass spectra classification. In: Marchiori, E., Moore, J.H. (eds.) EvoBIO 2008. LNCS, vol. 4973, pp. 1–11. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  6. Hayward, J., Alvarez, S., Ruiz, C., Sullivan, M., Tseng, J., Whalen, G.: Knowledge discovery in clinical performance of cancer patients. In: IEEE International Conference on Bioinformatics and Biomedicine, USA, pp. 51–58 (2008)

    Google Scholar 

  7. Serrano, J.I., Tomeckova, M., Zvarova, J.: Machine learning methods for knowledge discovery in medical data on Atherosclerosis. European Journal for Biomedical Informatics 2(1), 6–33 (2006)

    Google Scholar 

  8. Kononenko, I.: Machine learning for medical diagnosis: History, state of the art and perspective. Artificial Intelligence in Medicine 23(1), 89–109 (1995)

    Article  MathSciNet  Google Scholar 

  9. Lavrac, N.: Selected techniques for data mining in medicine. Artificial Intelligence in Medicine 16, 3–23 (1999)

    Article  Google Scholar 

  10. UCI repository of machine learning databases, University of California-Irvine, Department of Information and Computer Science, www.ics.uci.edu/~mlearn/MLRepository.html

  11. Ovarian cancer studies, center for cancer research, National Cancer Institute, USA, http://home.ccr.cancer.gov/ncifdaproteomics/ppatterns.asp

  12. Witten, I.H., Frank, E.: Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  13. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning and Research 3, 1157–1182 (2003)

    MATH  Google Scholar 

  14. Rish, I.: An empirical study of the naive Bayes classifier. In: IJCAI Workshop on Empirical Methods in Artifical Intelligence, pp. 41–46 (2001)

    Google Scholar 

  15. Haykin, S.: Neural networks: a comprehensive foundation, 2nd edn. Pearson Education, London (1998)

    MATH  Google Scholar 

  16. Aha, D.W., Kibler, D., Albert, M.K.: Instance based learning algorithms. Machine Learning 6(1), 37–66 (1991)

    Google Scholar 

  17. Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  18. Vapnik, V.N.: Statistical learning theory. Wiley Interscience, USA (1998)

    MATH  Google Scholar 

  19. Cohen, W.W.: Fast effective rule induction. In: Proceedings of Twelfth International Conference on Machine Learning, USA, pp. 115–123 (1995)

    Google Scholar 

  20. Breiman, L.: Bagging Predictors. Machine Learning 24(2), 123–140 (1996)

    MATH  Google Scholar 

  21. Freund, Y., Schapire, R.: Experiments with a new boosting algorithm. In: Proceedings of the Thirteenth International Conference on Machine Learning, Italy, pp. 148–156 (1996)

    Google Scholar 

  22. Ting, K.M., Witten, I.H.: Stacked generalization: when does it work. In: Proceedings of the Fifteenth IJCAI, pp. 866–871. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  23. Abe, H., Yamaguchi, T.: Constructive meta-learning with machine learning method repository. In: Orchard, B., Yang, C., Ali, M. (eds.) IEA/AIE 2004. LNCS, vol. 3029, pp. 502–511. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  24. Fawcett, T.: ROC graphs: notes and practical considerations for researchers, TR HPL-2003-4, HP Labs, USA (2004)

    Google Scholar 

  25. Walter, S.D.: The partial area under the summary ROC curve. Statistics in Medicine 24(13), 2025–2040 (2005)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tanwani, A.K., Afridi, J., Shafiq, M.Z., Farooq, M. (2009). Guidelines to Select Machine Learning Scheme for Classification of Biomedical Datasets. In: Pizzuti, C., Ritchie, M.D., Giacobini, M. (eds) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. EvoBIO 2009. Lecture Notes in Computer Science, vol 5483. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01184-9_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01184-9_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01183-2

  • Online ISBN: 978-3-642-01184-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics