Skip to main content
Log in

Classification and disease probability prediction via machine learning programming based on multi-GPU cluster MapReduce system

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

This paper described the nascent filed of big health data classification and disease probability prediction based on multi-GPU cluster MapReduce platform. Firstly, we presented a novel optimization-based multi-GPU cluster MapReduce system (gcMR) which is general purpose and suitable for processing big health data. Secondly, we proposed a new method IVP-SVM to solve the problem of big health data classification and disease probabilistic predictive inaccuracy. To illustrate the power and flexibility of gcMR platform for big health data, applications of a broad class of health big data using IVP-SVM on gcMR platform are described. Experimental results shown that gcMR platform yields an average computing efficiency on different health applications ranging from 1.8- to 13.5-folds by comparing gcMR with other Multi-GPU MapReduce platform. And an accuracy of the proposed IVP-SVM on different health applications is ranging from 85 to 100 %. This provides a motivation for pursuing the use of gcMR and IVP-SVM as a big health data analytical platform and tool, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Raghupathi W, Raghupathi V (2014) Big data analytics in healthcare: promise and potential. Health Inf Sci Syst 2(1):1–10

    Article  Google Scholar 

  2. Herland M, Khoshgoftaar TM, Rabdall W (2014) A review of data mining using big data in health informatics. J Big Data 1:2

    Article  Google Scholar 

  3. Chauhan R, Kaur H (2015) A spectrum of big data applications for data analytics. Comput Intell Big Data Anal 19:165–179

    Google Scholar 

  4. Silverstein JC, Foster lT (2014) Computer architectures for health care and biomedicine. Biomed Inform 149–184

  5. Mohammed EA, Far BH, Naugler C (2014) Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends. Biodata Min 7(1):1

    Article  Google Scholar 

  6. Fang W, He B, Luo Q et al (2011) Mars: accelerating mapreduce with graphics processors. IEEE Trans Parallel Distrib Syst 22(4):608–620

    Article  Google Scholar 

  7. Guo Y, Liu W, Gong B et al (2013) GCMR: a GPU Cluster-Based MapReduce Framework for Large-Scale Data Processing. In: 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC), pp 580–586

  8. Xie M, Kang K-D, Basaran C (2013) Moim: a Multi-GPU MapReduce Framework. In: 2013 IEEE 16th International Conference on Computational Science and Engineering (CSE), pp 1279–1286

  9. Gao H, Tang J, Wu G (2013) A MapReduce Computing Framework based on GPU Cluster. In: 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC-EUC), pp 1902–1907

  10. Sharma N, Om H (2015) Significant patterns extraction to find most effective treatment for oral cancer using data mining. Syst Think Approach Soc Probl 327:385–396

    Google Scholar 

  11. Rane AL (2015) Clinical decision support model for prevailing diseases to improve human life survivability. In: 2015 International Conference on Pervasive Computing (ICPC), pp. 1–5

  12. Lee EK, Tsung-Lin W (2009) Classification and disease prediction via mathematical programming. Handb Optim Med 26:1–50

    MathSciNet  MATH  Google Scholar 

  13. Eirola E, Gritsenko A, Akusok A et al (2015) Extreme learning machines for multiclass classification: refining predictions with gaussian mixture models. Adv Comput Intell 9095:153–164

    Article  Google Scholar 

  14. Lambrou A, Nouretdinov l, Papadopoulos H (2015) Inductive venn prediction. Ann Math Artif Intell 74(1–2):181–201

    Article  MathSciNet  MATH  Google Scholar 

  15. Tauhidul lslam AKM, Jeong B-S, Golam Bari ATM et al (2015) MapReduce based parallel gene selection method. Appl Intell 42(2):147–156

    Article  Google Scholar 

  16. Jiang H, Chen Y, Qiao Zhi et al (2015) Scaling up mapreduce-based big data processing on multi-GPU systems. Clust Comput 18(1):369–383

    Article  Google Scholar 

  17. Vega J, Dormido-Canto S, Martinez F et al (2015) Computationally efficient five-class image classifier based on venn predictors. Stat Learn Data Sci 366–375

  18. Lambrou A, Papadopoulos H, Gammeramn A (2012) Calibrated probabilistic predictions for biomedical applications. In: 2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE), 9047:211–216

  19. Menor M, Baek K, Poisson G (2013) Multiclass relevance units machine: benchmark evaluation and application to small ncRNA discovery. BMC Genom 14(2):1

    Google Scholar 

  20. Lambrou A, Papadopoulos H, Gammerman A (2013) Osteoporosis risk assessment with well-calibrated probabilistic outputs. Artif Intell Appl Innov 412:423–441

    Google Scholar 

  21. Nouretdinov I, Lebedev A (2013) Defensive forecast for conformal bounded ression. Artif Intell Appl Innov 412:384–393

    Article  Google Scholar 

  22. Adamskiy D, Nouretdinov I, Mitchell A et al (2011) Applying conformal prediction to the bovine TB diagnosing. Artif Intell Appl Innov 364:449–454

    Google Scholar 

  23. Nouretdinov I, Devetyarov D, Vovk V et al (2015) Multiprobabilistic prediction in early medical diagnoses. Ann Math Artif Intell 74(1–2):203–222

    Article  MathSciNet  Google Scholar 

  24. Devetyarov D, Nouretdinov I, Burford B et al (2012) Conformal predictors in early diagnostics of ovarian and breast cancers. Prog Artif Intell 1(3):245–257

    Article  Google Scholar 

  25. Thapliyal H, Arabnia HR (2006) Reversible Programmable Logic Array (RPLA) Using Fredkin and Feynman Gates for Industrial Electronics and Applications. In: Proceedings of the 2006 International Conference on Computer Design & Conference on Computing in Nanotechnology (CDES’06: June 26-29, 2016; Las Vegas, USA), pp 70–74

  26. Thapliyal H, Arabnia HR, Srinivas MB (2009) Efficient reversible logic design of BCD subtractors, vol III, LNCS 5300. Transactions on Computational Science Journal. Springer, pp 99–121

  27. Balasubramanian P, Edwards DA, Arabnia HR (2011) Robust asynchronous carry lookahead adders. In: Proceedings of the International Conference on Computer Design (CDES’11: July 2011, USA) pp 119–124

  28. Balasubramanian P, Arisaka R, Arabnia HR (2012) RB_DSOP: a rule based disjoint sum of products synthesis method. In: Proceedings of the 2012 International Conference on Computer Design (CDES’12: July, Las Vegas, USA) pp 39–43

  29. Thapliyal H, Jayashree HV, Nagamani AN, Arabnia HR (2013) Progress in Reversible Processor Design: A Novel Methodology for Reversible Carry Look-ahead Adder. In: Gavrilova ML, Tan CJK (eds) Transactions in Computational Science, XVII, LNCS 7420. Springer, Berlin Heidelberg, pp 73–97

  30. Verner U, Mendelson A, Schuster A (2014) Batch method for efficient resource sharing in real-time multi-GPU sytems. Distrib Comput Netw 8314:347–362

    Article  Google Scholar 

  31. Navarro A, Vilches A, Corbera F et al (2014) Strategies for maximizing utilization on multi-CPU and multi-GPU heterogeneous architectures. J Supercomput 70(2):756–771

    Article  Google Scholar 

  32. Li H, Yu D, Kumar A (2014) Performance modeling in CUDA streams-A means for high-throughput data processing. IEEE Int Conf Big Data 2014:301–310

    Google Scholar 

  33. Sourouri M, Gillberg T, Baden SB et al (2014) Effective multi-GPU communication using multiple CUDA streams and threads. In: 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS), pp 981–986

  34. Shainer G, Ayoub A, Lui P et al (2011) The development of Mellanox/NVIDIA GPUDirect over InfiniBand-a new model for GPU to GPU communications. Comput Sci Res Dev 26(3–4):267–273

    Article  Google Scholar 

  35. Wang H, Potluri S, Luo M et al (2011) MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters. Comput Sci Res Dev 26(3–4):257–266

    Article  Google Scholar 

  36. Ranjan R, Misra R (2014) Epidemic disease propagation detection algorithm using MapReduce for realistic social contact networks. In: 2014 International Conference High Performance Computing and Applications (ICHPCA), pp 1–6

  37. Herrero-Lopez S (2011) Accelerating SVMs by integrating GPUs into MapReduce clusters. In: 2011 IEEE International Conference on System, Man, and Cybernetics (SMC), pp 1298–1305

  38. Thapliyal H, Arabnia HR, Vinod AP (2006) Combined Integer and Floating Point Multiplication Architecture (CIFM) for FPGAs and Its Reversible Logic Implementation. In: 49th IEEE International Midwest Symposium on Circuits and Systems (MWSCAS’06), San Juan, Puerto Rico, August 6–9, pp 148–154

  39. Zhou C, Nouretdinov l, Luo Z et al (2014) SVM venn machine with k-means clustering. Artif Intell Innov 437:251–260

    Google Scholar 

Download references

Acknowledgments

This work was funded by the National Natural Science Foundation of China (61572325), National Natural Science of China (60970012), Doctoral Program of Higher Specialized Research Fund Ph.D. (20113120110008), Shanghai Key Scientific and Technological Project (14511107902,16DZ1203603), Shanghai Engineering Center Construction Project (GCZX14014), Shanghai Smart Home massive Things Generic Technology Engineering Center Project (GCZX14014), Shanghai-class Discipline Construction Project (XTKX2012), and HJ Special Fund Research Base(C14001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingkui Chen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Chen, Q. & Liu, B. Classification and disease probability prediction via machine learning programming based on multi-GPU cluster MapReduce system. J Supercomput 73, 1782–1809 (2017). https://doi.org/10.1007/s11227-016-1883-8

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1883-8

Keywords

Navigation