Skip to main content
Log in

A Software Framework for Building Biomedical Machine Learning Classifiers through Grid Computing Resources

  • ORIGINAL PAPER
  • Published:
Journal of Medical Systems Aims and scope Submit manuscript

Abstract

This paper describes the BiomedTK software framework, created to perform massive explorations of machine learning classifiers configurations for biomedical data analysis over distributed Grid computing resources. BiomedTK integrates ROC analysis throughout the complete classifier construction process and enables explorations of large parameter sweeps for training third party classifiers such as artificial neural networks and support vector machines, offering the capability to harness the vast amount of computing power serviced by Grid infrastructures. In addition, it includes classifiers modified by the authors for ROC optimization and functionality to build ensemble classifiers and manipulate datasets (import/export, extract and transform data, etc.). BiomedTK was experimentally validated by training thousands of classifier configurations for representative biomedical UCI datasets reaching in little time classification levels comparable to those reported in existing literature. The comprehensive method herewith presented represents an improvement to biomedical data analysis in both methodology and potential reach of machine learning based experimentation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Ramos-Pollan, R., et al., “Exploiting eInfrastructures for medical image storage and analysis: A Grid application for mammography CAD,” in The Seventh IASTED International Conference on Biomedical Engineering. Austria: Innsbruck, 2010.

  2. Drakos, J., et al., A perspective for biomedical data integration: Design of databases for flow cytometry. BMC Bioinform. 9:99, 2008.

    Article  Google Scholar 

  3. Ramos-Pollan, R., et al., “Grid computing for breast cancer CAD. A pilot experience in a medical environment,” in 4th Iberian Grid Infrastructure Conference. Portugal: Minho, pp. 307–318, 2010.

  4. Blanquer Espert, I., et al., Content-based organisation of virtual repositories of DICOM objects. Future Generation Comput. Syst. 25:627–37, 2009.

    Article  Google Scholar 

  5. KaraçalI, B., Quasi-supervised learning for biomedical data analysis. Pattern Recognit. 43:3674–82, 2010.

    Article  MATH  Google Scholar 

  6. Peng, Y., et al., A novel feature selection approach for biomedical data classification. J. Biomed. Inform. 43:15–23, 2010.

    Article  Google Scholar 

  7. López, Y., et al., “Breast Cancer Diagnosis Based on a Suitable Combination of Deformable Models and Artificial Neural Networks Techniques.” in Progress is Pattern Recognition, Image Analysis and Applications. Lect. Notes Comput. Sci. 4756/2007:803–811, 2007.

  8. López, Y., et al., “Computer aided diagnosis system to detect breast cancer pathological lesions,” in Progress in Pattern Recognition, Image Analysis and Applications. Volume 5197/2008, ed. Berlin, Heidelberg: Springer, pp. 453–460, 2008.

  9. The Globus Alliance and Middleware. Available: http://www.globus.org/

  10. The gLite middleware. Available: http://glite.web.cern.ch

  11. The European Grid Initiative (EGI). Available: http://www.egi.eu

  12. EGI Availability/Reliability results for October 2010. Available: https://documents.egi.eu/public/ShowDocument?docid=238

  13. Halling-Brown, M., et al., A computational Grid framework for immunological applications. Philos. Transact. Series A Math. Phys. Eng. Sci. 367:2705–16, 2009.

    Article  Google Scholar 

  14. Kacsuk, P., “Extending the services and sites of production grids by the support of advanced portals” in Proceedings of High Performance Computing for Computational Science - VECPAR 2006. Rio de Janeiro, Brazil: pp. 644–655, 2007.

  15. Schwiegelshohn, U., et al., “Perspectives on grid computing,” in Dagstuhl Seminar Proceedings. Leibniz: 2009.

  16. Grid Computing: A Vertical Market Perspective 2006–2011. Available: http://www.insight-corp.com/reports/grid06.asp

  17. The DIRAC project. Available: http://lhcbweb.pic.es/DIRAC/

  18. Bagnasco, S., et al., AliEn: ALICE environment on the GRID. J. Phys. Conf. Ser. 119:062012, 2008.

    Article  Google Scholar 

  19. Ramos-Pollan, R., et al., “Grid-based architecture to host multiple repositories: A mammography image analysis use case,” in 3rd Iberian Grid Infrastructure Conference Proceedings. Valencia, Spain: pp. 327–338, 2009.

  20. Ramos-Pollan, R., et al., “Building medical image repositories and CAD systems on grid infrastructures: A Mammograms Case,” in 15th edition of the Portuguese Conference on Pattern Recognition. Aveiro, Portugal: University of Aveiro, 2009.

  21. Ramos-Pollan, R., and Guevara, M., “Grid infrastructures for developing mammography CAD systems” in 32nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Argentina: Buenos Aires, 2010.

  22. Frank, A., and Asuncion, A., UCI Machine Learning Repository http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Science, 2010.

  23. Ramos Pollan,R., et al., “Introducing ROC curves as error measure functions. A new approach to train ANN-based biomedical data classifiers,” in 15th Iberoamerican Congress on Pattern Recognition. Sao Paolo, Brasil: 2010.

  24. Heaton, J., “Programming neural networks with encog 2 in Java,” ed.: Heaton Research, Inc, 2010.

  25. Chang, C.-C., and Lin, C.-J., LIBSVM: a library for support vector machines. Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm, 2001.

  26. Yoon, H. J., et al., Evaluating computer-aided detection algorithms. Med. Phys. 34:2024–38, 2007.

    Article  Google Scholar 

  27. Fawcett, T., An introduction to ROC analysis. Pattern Recognit. Lett. 27:861–74, 2006.

    Article  Google Scholar 

  28. John Eng, M. D., ROC analysis: web-based calculator for ROC curves. Available: http://www.jrocfit.org, 2006.

  29. Mark Hall, et al., “The WEKA data mining software: An update,” SIGKDD Explorations, vol. 11: 2009.

  30. Cortes, C., and Mohri, M., AUC optimization vs. error rate minimization. Adv. Neural Inf. Process. Syst. 16:313–20, 2004.

    Google Scholar 

  31. Kim, J.-H., Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Comput. Stat. Data Anal. 53:3735–45, 2009.

    Article  MATH  Google Scholar 

  32. Efron, B., and Gong, G., A leisurely look at the bootstrap, the jackknife, and cross-validation. Am. Stat. 37:36–48, 1983.

    MathSciNet  Google Scholar 

  33. Efron, B., Estimating the error rate of a prediction rule: Improvement on cross-validation. J. Am. Stat. Assoc. 78:316–31, 1983.

    MathSciNet  MATH  Google Scholar 

  34. The H2 Database Engine. Available: http://www.h2database.com

  35. Dietterich, T. G., “Ensemble methods in machine learning,” presented at the Proceedings of the First International Workshop on Multiple Classifier Systems. 2000.

  36. Bose, R., and Ray-Chaudhuri, “On a class of error-correcting binary group codes,” Information Control. vol. 3: pp. 68–79, 1960.

  37. Hocquenghen, A., Codes correcteurs d’erreurs. Chiffres 2:147–56, 1959.

    MathSciNet  Google Scholar 

  38. Dietterich, T., and Bakiri, G., “Error-correcting output codes: A general method for improving multiclass inductive learning programs,” in Proceedings of the Ninth National Conference on Artificial Intelligence (AAAI-91). Anaheim, CA: pp. 572–577, 1991.

  39. Passerini, A., et al., New results on error correcting output codes of kernel machines. IEEE Trans. Neural Net. 15:45–54, 2004.

    Article  Google Scholar 

  40. Escalera, S., et al., Subclass problem-dependent design for error-correcting output codes. IEEE Trans. Patt. Anal. Mach. Intell. 30:1041–54, 2008.

    Article  Google Scholar 

  41. Huiqun, D., et al., “Error-correcting output coding for the convolutional neural network for optical character recognition,” in Document Analysis and Recognition, 2009. ICDAR’09 10th International Conference on 2009. pp. 581–585, 2009.

  42. Escalera, S., et al., “Coronary damage classification of patients with the Chagas disease with error-correcting output codes,” in Intelligent Systems, 2008. IS’08. 4th International IEEE Conference. pp. 12-17-12-22, 2008.

  43. Urbanowicz, R. J., and Moore, J. H., Learning classifier systems: a complete introduction, review, and roadmap. J. Artif. Evol. App. 2009:1–25, 2009.

    Article  Google Scholar 

  44. Kotsiantis, S., et al., Machine learning: a review of classification and combining techniques. Artif. Intell. Rev. 26:159–90, 2006.

    Article  Google Scholar 

  45. Lorena, A. C., et al., A review on the combination of binary classifiers in multiclass problems. Artif. Intell. Rev. 30:19–37, 2008.

    Article  Google Scholar 

  46. Soares, C., “Is the UCI repository useful for data mining?” in Progress in Artificial Intelligence. vol. 2902, ed. Berlin, Heidelberg: Springer, pp. 209–223, 2003.

  47. Estrela da Silva, J., et al., Classification of breast tissue by electrical impedance spectroscopy. Med. Biol. Eng. Comput. 38:26–30, 2000.

    Article  Google Scholar 

  48. Sebban, M., et al., Stopping criterion for boosting based data reduction techniques: From binary to multiclass problem. J. Mach. Learn. Res. 3:863–85, 2003.

    MathSciNet  MATH  Google Scholar 

  49. Wilson, D. R., and Martinez, T. R., “Improved center point selection for probabilistic neural networks,” in Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms, (ICANNGA’97). pp. 514–517, 1997.

  50. Jiang, Y., and Zhou, Z.-H., Editing training data for knn classifiers with neural network ensemble. Lect. Notes Comput. Sci. 3173:356–61, 2004.

    Article  Google Scholar 

  51. Fung,G., et al., “A fast iterative algorithm for fisher discriminant using heterogeneous kernels,” in Proceedings of the twenty-first international conference on Machine learning. Alberta, Canada: Banff, p. 40, 2004.

  52. Vlachos,M., et al., “Non-linear dimensionality reduction techniques for classification and visualization,” in Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. Edmonton, Alberta, Canada, pp. 645–651, 2002.

  53. Esmeir, S., and Markovitch, S.,“Lookahead-based algorithms for anytime induction of decision trees,” in Proceedings of the twenty-first international conference on Machine learning. Alberta, Canada: Banff, p. 33, 2004.

  54. Elter, M., et al., The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. Med. Phys. 34(11):4164–72, 2007.

    Article  Google Scholar 

  55. Little, M. A., et al., Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. IEEE Trans. Biomed. Eng. 56:1015–22, 2009.

    Article  Google Scholar 

  56. Li, J., and Wong, L., “Using rules to analyse bio-medical data: A comparison between C4.5 and PCL,” in Advances in Web-Age Information Management. vol. 2762, ed. Berlin, Heidelberg: Springer, pp. 254–265, 2003.

  57. Domeniconi, C., and Yan, B., “Nearest neighbor ensemble,” in Proceedings of the Pattern Recognition, 17th International Conference on (ICPR’04) Volume 1 - Volume 01. pp. 228–231, 2004.

Download references

Acknowledgements

This work is part of the GRIDMED research collaboration project between INEGI (Portugal) and CETA-CIEMAT (Spain). Prof. Guevara acknowledges POPH - QREN-Tipologia 4.2–Promotion of scientific employment funded by the ESF and MCTES, Portugal. CETA-CIEMAT acknowledges the support of the European Regional Development Fund

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raúl Ramos-Pollán.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ramos-Pollán, R., Guevara-López, M.Á. & Oliveira, E. A Software Framework for Building Biomedical Machine Learning Classifiers through Grid Computing Resources. J Med Syst 36, 2245–2257 (2012). https://doi.org/10.1007/s10916-011-9692-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10916-011-9692-3

Keywords

Navigation