A Software Framework for Building Biomedical Machine Learning Classifiers through Grid Computing Resources

Ramos-Pollán, Raúl; Guevara-López, Miguel Ángel; Oliveira, Eugénio

doi:10.1007/s10916-011-9692-3

A Software Framework for Building Biomedical Machine Learning Classifiers through Grid Computing Resources

ORIGINAL PAPER
Published: 09 April 2011

Volume 36, pages 2245–2257, (2012)
Cite this article

Journal of Medical Systems Aims and scope Submit manuscript

Raúl Ramos-Pollán¹,
Miguel Ángel Guevara-López² &
Eugénio Oliveira³

463 Accesses
9 Citations
Explore all metrics

Abstract

This paper describes the BiomedTK software framework, created to perform massive explorations of machine learning classifiers configurations for biomedical data analysis over distributed Grid computing resources. BiomedTK integrates ROC analysis throughout the complete classifier construction process and enables explorations of large parameter sweeps for training third party classifiers such as artificial neural networks and support vector machines, offering the capability to harness the vast amount of computing power serviced by Grid infrastructures. In addition, it includes classifiers modified by the authors for ROC optimization and functionality to build ensemble classifiers and manipulate datasets (import/export, extract and transform data, etc.). BiomedTK was experimentally validated by training thousands of classifier configurations for representative biomedical UCI datasets reaching in little time classification levels comparable to those reported in existing literature. The comprehensive method herewith presented represents an improvement to biomedical data analysis in both methodology and potential reach of machine learning based experimentation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ParSMURF-NG: A Machine Learning High Performance Computing System for the Analysis of Imbalanced Big Omics Data

The basics of data, big data, and machine learning in clinical practice

Article 05 June 2020

Data and Computation: A Contemporary Landscape

References

Ramos-Pollan, R., et al., “Exploiting eInfrastructures for medical image storage and analysis: A Grid application for mammography CAD,” in The Seventh IASTED International Conference on Biomedical Engineering. Austria: Innsbruck, 2010.
Drakos, J., et al., A perspective for biomedical data integration: Design of databases for flow cytometry. BMC Bioinform. 9:99, 2008.
Article Google Scholar
Ramos-Pollan, R., et al., “Grid computing for breast cancer CAD. A pilot experience in a medical environment,” in 4th Iberian Grid Infrastructure Conference. Portugal: Minho, pp. 307–318, 2010.
Blanquer Espert, I., et al., Content-based organisation of virtual repositories of DICOM objects. Future Generation Comput. Syst. 25:627–37, 2009.
Article Google Scholar
KaraçalI, B., Quasi-supervised learning for biomedical data analysis. Pattern Recognit. 43:3674–82, 2010.
Article MATH Google Scholar
Peng, Y., et al., A novel feature selection approach for biomedical data classification. J. Biomed. Inform. 43:15–23, 2010.
Article Google Scholar
López, Y., et al., “Breast Cancer Diagnosis Based on a Suitable Combination of Deformable Models and Artificial Neural Networks Techniques.” in Progress is Pattern Recognition, Image Analysis and Applications. Lect. Notes Comput. Sci. 4756/2007:803–811, 2007.
López, Y., et al., “Computer aided diagnosis system to detect breast cancer pathological lesions,” in Progress in Pattern Recognition, Image Analysis and Applications. Volume 5197/2008, ed. Berlin, Heidelberg: Springer, pp. 453–460, 2008.
The Globus Alliance and Middleware. Available: http://www.globus.org/
The gLite middleware. Available: http://glite.web.cern.ch
The European Grid Initiative (EGI). Available: http://www.egi.eu
EGI Availability/Reliability results for October 2010. Available: https://documents.egi.eu/public/ShowDocument?docid=238
Halling-Brown, M., et al., A computational Grid framework for immunological applications. Philos. Transact. Series A Math. Phys. Eng. Sci. 367:2705–16, 2009.
Article Google Scholar
Kacsuk, P., “Extending the services and sites of production grids by the support of advanced portals” in Proceedings of High Performance Computing for Computational Science - VECPAR 2006. Rio de Janeiro, Brazil: pp. 644–655, 2007.
Schwiegelshohn, U., et al., “Perspectives on grid computing,” in Dagstuhl Seminar Proceedings. Leibniz: 2009.
Grid Computing: A Vertical Market Perspective 2006–2011. Available: http://www.insight-corp.com/reports/grid06.asp
The DIRAC project. Available: http://lhcbweb.pic.es/DIRAC/
Bagnasco, S., et al., AliEn: ALICE environment on the GRID. J. Phys. Conf. Ser. 119:062012, 2008.
Article Google Scholar
Ramos-Pollan, R., et al., “Grid-based architecture to host multiple repositories: A mammography image analysis use case,” in 3rd Iberian Grid Infrastructure Conference Proceedings. Valencia, Spain: pp. 327–338, 2009.
Ramos-Pollan, R., et al., “Building medical image repositories and CAD systems on grid infrastructures: A Mammograms Case,” in 15th edition of the Portuguese Conference on Pattern Recognition. Aveiro, Portugal: University of Aveiro, 2009.
Ramos-Pollan, R., and Guevara, M., “Grid infrastructures for developing mammography CAD systems” in 32nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society. Argentina: Buenos Aires, 2010.
Frank, A., and Asuncion, A., UCI Machine Learning Repository http://archive.ics.uci.edu/ml. Irvine, CA: University of California, School of Information and Computer Science, 2010.
Ramos Pollan,R., et al., “Introducing ROC curves as error measure functions. A new approach to train ANN-based biomedical data classifiers,” in 15th Iberoamerican Congress on Pattern Recognition. Sao Paolo, Brasil: 2010.
Heaton, J., “Programming neural networks with encog 2 in Java,” ed.: Heaton Research, Inc, 2010.
Chang, C.-C., and Lin, C.-J., LIBSVM: a library for support vector machines. Available: http://www.csie.ntu.edu.tw/~cjlin/libsvm, 2001.
Yoon, H. J., et al., Evaluating computer-aided detection algorithms. Med. Phys. 34:2024–38, 2007.
Article Google Scholar
Fawcett, T., An introduction to ROC analysis. Pattern Recognit. Lett. 27:861–74, 2006.
Article Google Scholar
John Eng, M. D., ROC analysis: web-based calculator for ROC curves. Available: http://www.jrocfit.org, 2006.
Mark Hall, et al., “The WEKA data mining software: An update,” SIGKDD Explorations, vol. 11: 2009.
Cortes, C., and Mohri, M., AUC optimization vs. error rate minimization. Adv. Neural Inf. Process. Syst. 16:313–20, 2004.
Google Scholar
Kim, J.-H., Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Comput. Stat. Data Anal. 53:3735–45, 2009.
Article MATH Google Scholar
Efron, B., and Gong, G., A leisurely look at the bootstrap, the jackknife, and cross-validation. Am. Stat. 37:36–48, 1983.
MathSciNet Google Scholar
Efron, B., Estimating the error rate of a prediction rule: Improvement on cross-validation. J. Am. Stat. Assoc. 78:316–31, 1983.
MathSciNet MATH Google Scholar
The H2 Database Engine. Available: http://www.h2database.com
Dietterich, T. G., “Ensemble methods in machine learning,” presented at the Proceedings of the First International Workshop on Multiple Classifier Systems. 2000.
Bose, R., and Ray-Chaudhuri, “On a class of error-correcting binary group codes,” Information Control. vol. 3: pp. 68–79, 1960.
Hocquenghen, A., Codes correcteurs d’erreurs. Chiffres 2:147–56, 1959.
MathSciNet Google Scholar
Dietterich, T., and Bakiri, G., “Error-correcting output codes: A general method for improving multiclass inductive learning programs,” in Proceedings of the Ninth National Conference on Artificial Intelligence (AAAI-91). Anaheim, CA: pp. 572–577, 1991.
Passerini, A., et al., New results on error correcting output codes of kernel machines. IEEE Trans. Neural Net. 15:45–54, 2004.
Article Google Scholar
Escalera, S., et al., Subclass problem-dependent design for error-correcting output codes. IEEE Trans. Patt. Anal. Mach. Intell. 30:1041–54, 2008.
Article Google Scholar
Huiqun, D., et al., “Error-correcting output coding for the convolutional neural network for optical character recognition,” in Document Analysis and Recognition, 2009. ICDAR’09 10th International Conference on 2009. pp. 581–585, 2009.
Escalera, S., et al., “Coronary damage classification of patients with the Chagas disease with error-correcting output codes,” in Intelligent Systems, 2008. IS’08. 4th International IEEE Conference. pp. 12-17-12-22, 2008.
Urbanowicz, R. J., and Moore, J. H., Learning classifier systems: a complete introduction, review, and roadmap. J. Artif. Evol. App. 2009:1–25, 2009.
Article Google Scholar
Kotsiantis, S., et al., Machine learning: a review of classification and combining techniques. Artif. Intell. Rev. 26:159–90, 2006.
Article Google Scholar
Lorena, A. C., et al., A review on the combination of binary classifiers in multiclass problems. Artif. Intell. Rev. 30:19–37, 2008.
Article Google Scholar
Soares, C., “Is the UCI repository useful for data mining?” in Progress in Artificial Intelligence. vol. 2902, ed. Berlin, Heidelberg: Springer, pp. 209–223, 2003.
Estrela da Silva, J., et al., Classification of breast tissue by electrical impedance spectroscopy. Med. Biol. Eng. Comput. 38:26–30, 2000.
Article Google Scholar
Sebban, M., et al., Stopping criterion for boosting based data reduction techniques: From binary to multiclass problem. J. Mach. Learn. Res. 3:863–85, 2003.
MathSciNet MATH Google Scholar
Wilson, D. R., and Martinez, T. R., “Improved center point selection for probabilistic neural networks,” in Proceedings of the International Conference on Artificial Neural Networks and Genetic Algorithms, (ICANNGA’97). pp. 514–517, 1997.
Jiang, Y., and Zhou, Z.-H., Editing training data for knn classifiers with neural network ensemble. Lect. Notes Comput. Sci. 3173:356–61, 2004.
Article Google Scholar
Fung,G., et al., “A fast iterative algorithm for fisher discriminant using heterogeneous kernels,” in Proceedings of the twenty-first international conference on Machine learning. Alberta, Canada: Banff, p. 40, 2004.
Vlachos,M., et al., “Non-linear dimensionality reduction techniques for classification and visualization,” in Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. Edmonton, Alberta, Canada, pp. 645–651, 2002.
Esmeir, S., and Markovitch, S.,“Lookahead-based algorithms for anytime induction of decision trees,” in Proceedings of the twenty-first international conference on Machine learning. Alberta, Canada: Banff, p. 33, 2004.
Elter, M., et al., The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. Med. Phys. 34(11):4164–72, 2007.
Article Google Scholar
Little, M. A., et al., Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. IEEE Trans. Biomed. Eng. 56:1015–22, 2009.
Article Google Scholar
Li, J., and Wong, L., “Using rules to analyse bio-medical data: A comparison between C4.5 and PCL,” in Advances in Web-Age Information Management. vol. 2762, ed. Berlin, Heidelberg: Springer, pp. 254–265, 2003.
Domeniconi, C., and Yan, B., “Nearest neighbor ensemble,” in Proceedings of the Pattern Recognition, 17th International Conference on (ICPR’04) Volume 1 - Volume 01. pp. 228–231, 2004.

Download references

Acknowledgements

This work is part of the GRIDMED research collaboration project between INEGI (Portugal) and CETA-CIEMAT (Spain). Prof. Guevara acknowledges POPH - QREN-Tipologia 4.2–Promotion of scientific employment funded by the ESF and MCTES, Portugal. CETA-CIEMAT acknowledges the support of the European Regional Development Fund

Author information

Authors and Affiliations

CETA-CIEMAT Centro Extremeño de Tecnologías Avanzadas, Calle Sola 1, 10200, Trujillo, Spain
Raúl Ramos-Pollán
INEGI-Faculdade de Engenharia, Universidade do Porto, Rua Roberto Frias 400, 4200–465, Porto, Portugal
Miguel Ángel Guevara-López
LIACC-DEI-Faculdade de Engenharia, Universidade do Porto, Rua Roberto Frias s/n, 4200–465, Porto, Portugal
Eugénio Oliveira

Authors

Raúl Ramos-Pollán
View author publications
You can also search for this author in PubMed Google Scholar
Miguel Ángel Guevara-López
View author publications
You can also search for this author in PubMed Google Scholar
Eugénio Oliveira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Raúl Ramos-Pollán.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ramos-Pollán, R., Guevara-López, M.Á. & Oliveira, E. A Software Framework for Building Biomedical Machine Learning Classifiers through Grid Computing Resources. J Med Syst 36, 2245–2257 (2012). https://doi.org/10.1007/s10916-011-9692-3

Download citation

Received: 14 December 2010
Accepted: 28 March 2011
Published: 09 April 2011
Issue Date: August 2012
DOI: https://doi.org/10.1007/s10916-011-9692-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Software Framework for Building Biomedical Machine Learning Classifiers through Grid Computing Resources

Abstract

Access this article

Similar content being viewed by others

ParSMURF-NG: A Machine Learning High Performance Computing System for the Analysis of Imbalanced Big Omics Data

The basics of data, big data, and machine learning in clinical practice

Data and Computation: A Contemporary Landscape

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Software Framework for Building Biomedical Machine Learning Classifiers through Grid Computing Resources

Abstract

Access this article

Similar content being viewed by others

ParSMURF-NG: A Machine Learning High Performance Computing System for the Analysis of Imbalanced Big Omics Data

The basics of data, big data, and machine learning in clinical practice

Data and Computation: A Contemporary Landscape

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation