Advertisement

Journal of Medical Systems

, Volume 36, Issue 4, pp 2245–2257 | Cite as

A Software Framework for Building Biomedical Machine Learning Classifiers through Grid Computing Resources

  • Raúl Ramos-PollánEmail author
  • Miguel Ángel Guevara-López
  • Eugénio Oliveira
ORIGINAL PAPER

Abstract

This paper describes the BiomedTK software framework, created to perform massive explorations of machine learning classifiers configurations for biomedical data analysis over distributed Grid computing resources. BiomedTK integrates ROC analysis throughout the complete classifier construction process and enables explorations of large parameter sweeps for training third party classifiers such as artificial neural networks and support vector machines, offering the capability to harness the vast amount of computing power serviced by Grid infrastructures. In addition, it includes classifiers modified by the authors for ROC optimization and functionality to build ensemble classifiers and manipulate datasets (import/export, extract and transform data, etc.). BiomedTK was experimentally validated by training thousands of classifier configurations for representative biomedical UCI datasets reaching in little time classification levels comparable to those reported in existing literature. The comprehensive method herewith presented represents an improvement to biomedical data analysis in both methodology and potential reach of machine learning based experimentation.

Keywords

Machine learning classifiers Biomedical data analysis ROC analysis Grid infrastructures 

Notes

Acknowledgements

This work is part of the GRIDMED research collaboration project between INEGI (Portugal) and CETA-CIEMAT (Spain). Prof. Guevara acknowledges POPH - QREN-Tipologia 4.2–Promotion of scientific employment funded by the ESF and MCTES, Portugal. CETA-CIEMAT acknowledges the support of the European Regional Development Fund

References

  1. 1.
    Ramos-Pollan, R., et al., “Exploiting eInfrastructures for medical image storage and analysis: A Grid application for mammography CAD,” in The Seventh IASTED International Conference on Biomedical Engineering. Austria: Innsbruck, 2010.Google Scholar
  2. 2.
    Drakos, J., et al., A perspective for biomedical data integration: Design of databases for flow cytometry. BMC Bioinform. 9:99, 2008.CrossRefGoogle Scholar
  3. 3.
    Ramos-Pollan, R., et al., “Grid computing for breast cancer CAD. A pilot experience in a medical environment,” in 4th Iberian Grid Infrastructure Conference. Portugal: Minho, pp. 307–318, 2010.Google Scholar
  4. 4.
    Blanquer Espert, I., et al., Content-based organisation of virtual repositories of DICOM objects. Future Generation Comput. Syst. 25:627–37, 2009.CrossRefGoogle Scholar
  5. 5.
    KaraçalI, B., Quasi-supervised learning for biomedical data analysis. Pattern Recognit. 43:3674–82, 2010.zbMATHCrossRefGoogle Scholar
  6. 6.
    Peng, Y., et al., A novel feature selection approach for biomedical data classification. J. Biomed. Inform. 43:15–23, 2010.CrossRefGoogle Scholar
  7. 7.
    López, Y., et al., “Breast Cancer Diagnosis Based on a Suitable Combination of Deformable Models and Artificial Neural Networks Techniques.” in Progress is Pattern Recognition, Image Analysis and Applications. Lect. Notes Comput. Sci. 4756/2007:803–811, 2007.Google Scholar
  8. 8.
    López, Y., et al., “Computer aided diagnosis system to detect breast cancer pathological lesions,” in Progress in Pattern Recognition, Image Analysis and Applications. Volume 5197/2008, ed. Berlin, Heidelberg: Springer, pp. 453–460, 2008.Google Scholar
  9. 9.
    The Globus Alliance and Middleware. Available: http://www.globus.org/
  10. 10.
    The gLite middleware. Available: http://glite.web.cern.ch
  11. 11.
    The European Grid Initiative (EGI). Available: http://www.egi.eu
  12. 12.
    EGI Availability/Reliability results for October 2010. Available: https://documents.egi.eu/public/ShowDocument?docid=238
  13. 13.
    Halling-Brown, M., et al., A computational Grid framework for immunological applications. Philos. Transact. Series A Math. Phys. Eng. Sci. 367:2705–16, 2009.CrossRefGoogle Scholar
  14. 14.
    Kacsuk, P., “Extending the services and sites of production grids by the support of advanced portals” in Proceedings of High Performance Computing for Computational Science - VECPAR 2006. Rio de Janeiro, Brazil: pp. 644–655, 2007.Google Scholar
  15. 15.
    Schwiegelshohn, U., et al., “Perspectives on grid computing,” in Dagstuhl Seminar Proceedings. Leibniz: 2009.Google Scholar
  16. 16.
    Grid Computing: A Vertical Market Perspective 2006–2011. Available: http://www.insight-corp.com/reports/grid06.asp
  17. 17.
    The DIRAC project. Available: http://lhcbweb.pic.es/DIRAC/
  18. 18.
    Bagnasco, S., et al., AliEn: ALICE environment on the GRID. J. Phys. Conf. Ser. 119:062012, 2008.CrossRefGoogle Scholar
  19. 19.
    Ramos-Pollan, R., et al., “Grid-based architecture to host multiple repositories: A mammography image analysis use case,” in 3rd Iberian Grid Infrastructure Conference Proceedings. Valencia, Spain: pp. 327–338, 2009.Google Scholar
  20. 20.
    Ramos-Pollan, R., et al., “Building medical image repositories and CAD systems on grid infrastructures: A Mammograms Case,” in 15th edition of the Portuguese Conference on Pattern Recognition. Aveiro, Portugal: University of Aveiro, 2009.Google Scholar
  21. 21.
    Ramos-Pollan, R., and Guevara, M., “Grid infrastructures for developing mammography CAD systems” in 32nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Argentina: Buenos Aires, 2010.Google Scholar
  22. 22.
    Frank, A., and Asuncion, A., UCI Machine Learning Repository http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Science, 2010.
  23. 23.
    Ramos Pollan,R., et al., “Introducing ROC curves as error measure functions. A new approach to train ANN-based biomedical data classifiers,” in 15th Iberoamerican Congress on Pattern Recognition. Sao Paolo, Brasil: 2010.Google Scholar
  24. 24.
    Heaton, J., “Programming neural networks with encog 2 in Java,” ed.: Heaton Research, Inc, 2010.Google Scholar
  25. 25.
    Chang, C.-C., and Lin, C.-J., LIBSVM: a library for support vector machines. Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm, 2001.
  26. 26.
    Yoon, H. J., et al., Evaluating computer-aided detection algorithms. Med. Phys. 34:2024–38, 2007.CrossRefGoogle Scholar
  27. 27.
    Fawcett, T., An introduction to ROC analysis. Pattern Recognit. Lett. 27:861–74, 2006.CrossRefGoogle Scholar
  28. 28.
    John Eng, M. D., ROC analysis: web-based calculator for ROC curves. Available: http://www.jrocfit.org, 2006.
  29. 29.
    Mark Hall, et al., “The WEKA data mining software: An update,” SIGKDD Explorations, vol. 11: 2009.Google Scholar
  30. 30.
    Cortes, C., and Mohri, M., AUC optimization vs. error rate minimization. Adv. Neural Inf. Process. Syst. 16:313–20, 2004.Google Scholar
  31. 31.
    Kim, J.-H., Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Comput. Stat. Data Anal. 53:3735–45, 2009.zbMATHCrossRefGoogle Scholar
  32. 32.
    Efron, B., and Gong, G., A leisurely look at the bootstrap, the jackknife, and cross-validation. Am. Stat. 37:36–48, 1983.MathSciNetGoogle Scholar
  33. 33.
    Efron, B., Estimating the error rate of a prediction rule: Improvement on cross-validation. J. Am. Stat. Assoc. 78:316–31, 1983.MathSciNetzbMATHGoogle Scholar
  34. 34.
    The H2 Database Engine. Available: http://www.h2database.com
  35. 35.
    Dietterich, T. G., “Ensemble methods in machine learning,” presented at the Proceedings of the First International Workshop on Multiple Classifier Systems. 2000.Google Scholar
  36. 36.
    Bose, R., and Ray-Chaudhuri, “On a class of error-correcting binary group codes,” Information Control. vol. 3: pp. 68–79, 1960.Google Scholar
  37. 37.
    Hocquenghen, A., Codes correcteurs d’erreurs. Chiffres 2:147–56, 1959.MathSciNetGoogle Scholar
  38. 38.
    Dietterich, T., and Bakiri, G., “Error-correcting output codes: A general method for improving multiclass inductive learning programs,” in Proceedings of the Ninth National Conference on Artificial Intelligence (AAAI-91). Anaheim, CA: pp. 572–577, 1991.Google Scholar
  39. 39.
    Passerini, A., et al., New results on error correcting output codes of kernel machines. IEEE Trans. Neural Net. 15:45–54, 2004.CrossRefGoogle Scholar
  40. 40.
    Escalera, S., et al., Subclass problem-dependent design for error-correcting output codes. IEEE Trans. Patt. Anal. Mach. Intell. 30:1041–54, 2008.CrossRefGoogle Scholar
  41. 41.
    Huiqun, D., et al., “Error-correcting output coding for the convolutional neural network for optical character recognition,” in Document Analysis and Recognition, 2009. ICDAR’09 10th International Conference on 2009. pp. 581–585, 2009.Google Scholar
  42. 42.
    Escalera, S., et al., “Coronary damage classification of patients with the Chagas disease with error-correcting output codes,” in Intelligent Systems, 2008. IS’08. 4th International IEEE Conference. pp. 12-17-12-22, 2008.Google Scholar
  43. 43.
    Urbanowicz, R. J., and Moore, J. H., Learning classifier systems: a complete introduction, review, and roadmap. J. Artif. Evol. App. 2009:1–25, 2009.CrossRefGoogle Scholar
  44. 44.
    Kotsiantis, S., et al., Machine learning: a review of classification and combining techniques. Artif. Intell. Rev. 26:159–90, 2006.CrossRefGoogle Scholar
  45. 45.
    Lorena, A. C., et al., A review on the combination of binary classifiers in multiclass problems. Artif. Intell. Rev. 30:19–37, 2008.CrossRefGoogle Scholar
  46. 46.
    Soares, C., “Is the UCI repository useful for data mining?” in Progress in Artificial Intelligence. vol. 2902, ed. Berlin, Heidelberg: Springer, pp. 209–223, 2003.Google Scholar
  47. 47.
    Estrela da Silva, J., et al., Classification of breast tissue by electrical impedance spectroscopy. Med. Biol. Eng. Comput. 38:26–30, 2000.CrossRefGoogle Scholar
  48. 48.
    Sebban, M., et al., Stopping criterion for boosting based data reduction techniques: From binary to multiclass problem. J. Mach. Learn. Res. 3:863–85, 2003.MathSciNetzbMATHGoogle Scholar
  49. 49.
    Wilson, D. R., and Martinez, T. R., “Improved center point selection for probabilistic neural networks,” in Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms, (ICANNGA’97). pp. 514–517, 1997.Google Scholar
  50. 50.
    Jiang, Y., and Zhou, Z.-H., Editing training data for knn classifiers with neural network ensemble. Lect. Notes Comput. Sci. 3173:356–61, 2004.CrossRefGoogle Scholar
  51. 51.
    Fung,G., et al., “A fast iterative algorithm for fisher discriminant using heterogeneous kernels,” in Proceedings of the twenty-first international conference on Machine learning. Alberta, Canada: Banff, p. 40, 2004.Google Scholar
  52. 52.
    Vlachos,M., et al., “Non-linear dimensionality reduction techniques for classification and visualization,” in Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. Edmonton, Alberta, Canada, pp. 645–651, 2002.Google Scholar
  53. 53.
    Esmeir, S., and Markovitch, S.,“Lookahead-based algorithms for anytime induction of decision trees,” in Proceedings of the twenty-first international conference on Machine learning. Alberta, Canada: Banff, p. 33, 2004.Google Scholar
  54. 54.
    Elter, M., et al., The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. Med. Phys. 34(11):4164–72, 2007.CrossRefGoogle Scholar
  55. 55.
    Little, M. A., et al., Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. IEEE Trans. Biomed. Eng. 56:1015–22, 2009.CrossRefGoogle Scholar
  56. 56.
    Li, J., and Wong, L., “Using rules to analyse bio-medical data: A comparison between C4.5 and PCL,” in Advances in Web-Age Information Management. vol. 2762, ed. Berlin, Heidelberg: Springer, pp. 254–265, 2003.Google Scholar
  57. 57.
    Domeniconi, C., and Yan, B., “Nearest neighbor ensemble,” in Proceedings of the Pattern Recognition, 17th International Conference on (ICPR’04) Volume 1 - Volume 01. pp. 228–231, 2004.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Raúl Ramos-Pollán
    • 1
    Email author
  • Miguel Ángel Guevara-López
    • 2
  • Eugénio Oliveira
    • 3
  1. 1.CETA-CIEMAT Centro Extremeño de Tecnologías AvanzadasTrujilloSpain
  2. 2.INEGI-Faculdade de EngenhariaUniversidade do PortoPortoPortugal
  3. 3.LIACC-DEI-Faculdade de EngenhariaUniversidade do PortoPortoPortugal

Personalised recommendations