The Journal of Supercomputing

, Volume 73, Issue 5, pp 1782–1809 | Cite as

Classification and disease probability prediction via machine learning programming based on multi-GPU cluster MapReduce system

  • Jinjing Li
  • Qingkui Chen
  • Bocheng Liu


This paper described the nascent filed of big health data classification and disease probability prediction based on multi-GPU cluster MapReduce platform. Firstly, we presented a novel optimization-based multi-GPU cluster MapReduce system (gcMR) which is general purpose and suitable for processing big health data. Secondly, we proposed a new method IVP-SVM to solve the problem of big health data classification and disease probabilistic predictive inaccuracy. To illustrate the power and flexibility of gcMR platform for big health data, applications of a broad class of health big data using IVP-SVM on gcMR platform are described. Experimental results shown that gcMR platform yields an average computing efficiency on different health applications ranging from 1.8- to 13.5-folds by comparing gcMR with other Multi-GPU MapReduce platform. And an accuracy of the proposed IVP-SVM on different health applications is ranging from 85 to 100 %. This provides a motivation for pursuing the use of gcMR and IVP-SVM as a big health data analytical platform and tool, respectively.


Classification and disease probability prediction Multi-GPU cluster-based MapReduce platform (gcMR) IVP-SVM Big health data analytical 



This work was funded by the National Natural Science Foundation of China (61572325), National Natural Science of China (60970012), Doctoral Program of Higher Specialized Research Fund Ph.D. (20113120110008), Shanghai Key Scientific and Technological Project (14511107902,16DZ1203603), Shanghai Engineering Center Construction Project (GCZX14014), Shanghai Smart Home massive Things Generic Technology Engineering Center Project (GCZX14014), Shanghai-class Discipline Construction Project (XTKX2012), and HJ Special Fund Research Base(C14001).


  1. 1.
    Raghupathi W, Raghupathi V (2014) Big data analytics in healthcare: promise and potential. Health Inf Sci Syst 2(1):1–10CrossRefGoogle Scholar
  2. 2.
    Herland M, Khoshgoftaar TM, Rabdall W (2014) A review of data mining using big data in health informatics. J Big Data 1:2CrossRefGoogle Scholar
  3. 3.
    Chauhan R, Kaur H (2015) A spectrum of big data applications for data analytics. Comput Intell Big Data Anal 19:165–179Google Scholar
  4. 4.
    Silverstein JC, Foster lT (2014) Computer architectures for health care and biomedicine. Biomed Inform 149–184Google Scholar
  5. 5.
    Mohammed EA, Far BH, Naugler C (2014) Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends. Biodata Min 7(1):1CrossRefGoogle Scholar
  6. 6.
    Fang W, He B, Luo Q et al (2011) Mars: accelerating mapreduce with graphics processors. IEEE Trans Parallel Distrib Syst 22(4):608–620CrossRefGoogle Scholar
  7. 7.
    Guo Y, Liu W, Gong B et al (2013) GCMR: a GPU Cluster-Based MapReduce Framework for Large-Scale Data Processing. In: 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC), pp 580–586Google Scholar
  8. 8.
    Xie M, Kang K-D, Basaran C (2013) Moim: a Multi-GPU MapReduce Framework. In: 2013 IEEE 16th International Conference on Computational Science and Engineering (CSE), pp 1279–1286Google Scholar
  9. 9.
    Gao H, Tang J, Wu G (2013) A MapReduce Computing Framework based on GPU Cluster. In: 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC-EUC), pp 1902–1907Google Scholar
  10. 10.
    Sharma N, Om H (2015) Significant patterns extraction to find most effective treatment for oral cancer using data mining. Syst Think Approach Soc Probl 327:385–396Google Scholar
  11. 11.
    Rane AL (2015) Clinical decision support model for prevailing diseases to improve human life survivability. In: 2015 International Conference on Pervasive Computing (ICPC), pp. 1–5Google Scholar
  12. 12.
    Lee EK, Tsung-Lin W (2009) Classification and disease prediction via mathematical programming. Handb Optim Med 26:1–50MathSciNetMATHGoogle Scholar
  13. 13.
    Eirola E, Gritsenko A, Akusok A et al (2015) Extreme learning machines for multiclass classification: refining predictions with gaussian mixture models. Adv Comput Intell 9095:153–164CrossRefGoogle Scholar
  14. 14.
    Lambrou A, Nouretdinov l, Papadopoulos H (2015) Inductive venn prediction. Ann Math Artif Intell 74(1–2):181–201MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Tauhidul lslam AKM, Jeong B-S, Golam Bari ATM et al (2015) MapReduce based parallel gene selection method. Appl Intell 42(2):147–156CrossRefGoogle Scholar
  16. 16.
    Jiang H, Chen Y, Qiao Zhi et al (2015) Scaling up mapreduce-based big data processing on multi-GPU systems. Clust Comput 18(1):369–383CrossRefGoogle Scholar
  17. 17.
    Vega J, Dormido-Canto S, Martinez F et al (2015) Computationally efficient five-class image classifier based on venn predictors. Stat Learn Data Sci 366–375Google Scholar
  18. 18.
    Lambrou A, Papadopoulos H, Gammeramn A (2012) Calibrated probabilistic predictions for biomedical applications. In: 2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE), 9047:211–216Google Scholar
  19. 19.
    Menor M, Baek K, Poisson G (2013) Multiclass relevance units machine: benchmark evaluation and application to small ncRNA discovery. BMC Genom 14(2):1Google Scholar
  20. 20.
    Lambrou A, Papadopoulos H, Gammerman A (2013) Osteoporosis risk assessment with well-calibrated probabilistic outputs. Artif Intell Appl Innov 412:423–441Google Scholar
  21. 21.
    Nouretdinov I, Lebedev A (2013) Defensive forecast for conformal bounded ression. Artif Intell Appl Innov 412:384–393CrossRefGoogle Scholar
  22. 22.
    Adamskiy D, Nouretdinov I, Mitchell A et al (2011) Applying conformal prediction to the bovine TB diagnosing. Artif Intell Appl Innov 364:449–454Google Scholar
  23. 23.
    Nouretdinov I, Devetyarov D, Vovk V et al (2015) Multiprobabilistic prediction in early medical diagnoses. Ann Math Artif Intell 74(1–2):203–222MathSciNetCrossRefGoogle Scholar
  24. 24.
    Devetyarov D, Nouretdinov I, Burford B et al (2012) Conformal predictors in early diagnostics of ovarian and breast cancers. Prog Artif Intell 1(3):245–257CrossRefGoogle Scholar
  25. 25.
    Thapliyal H, Arabnia HR (2006) Reversible Programmable Logic Array (RPLA) Using Fredkin and Feynman Gates for Industrial Electronics and Applications. In: Proceedings of the 2006 International Conference on Computer Design & Conference on Computing in Nanotechnology (CDES’06: June 26-29, 2016; Las Vegas, USA), pp 70–74Google Scholar
  26. 26.
    Thapliyal H, Arabnia HR, Srinivas MB (2009) Efficient reversible logic design of BCD subtractors, vol III, LNCS 5300. Transactions on Computational Science Journal. Springer, pp 99–121Google Scholar
  27. 27.
    Balasubramanian P, Edwards DA, Arabnia HR (2011) Robust asynchronous carry lookahead adders. In: Proceedings of the International Conference on Computer Design (CDES’11: July 2011, USA) pp 119–124Google Scholar
  28. 28.
    Balasubramanian P, Arisaka R, Arabnia HR (2012) RB_DSOP: a rule based disjoint sum of products synthesis method. In: Proceedings of the 2012 International Conference on Computer Design (CDES’12: July, Las Vegas, USA) pp 39–43Google Scholar
  29. 29.
    Thapliyal H, Jayashree HV, Nagamani AN, Arabnia HR (2013) Progress in Reversible Processor Design: A Novel Methodology for Reversible Carry Look-ahead Adder. In: Gavrilova ML, Tan CJK (eds) Transactions in Computational Science, XVII, LNCS 7420. Springer, Berlin Heidelberg, pp 73–97Google Scholar
  30. 30.
    Verner U, Mendelson A, Schuster A (2014) Batch method for efficient resource sharing in real-time multi-GPU sytems. Distrib Comput Netw 8314:347–362CrossRefGoogle Scholar
  31. 31.
    Navarro A, Vilches A, Corbera F et al (2014) Strategies for maximizing utilization on multi-CPU and multi-GPU heterogeneous architectures. J Supercomput 70(2):756–771CrossRefGoogle Scholar
  32. 32.
    Li H, Yu D, Kumar A (2014) Performance modeling in CUDA streams-A means for high-throughput data processing. IEEE Int Conf Big Data 2014:301–310Google Scholar
  33. 33.
    Sourouri M, Gillberg T, Baden SB et al (2014) Effective multi-GPU communication using multiple CUDA streams and threads. In: 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS), pp 981–986Google Scholar
  34. 34.
    Shainer G, Ayoub A, Lui P et al (2011) The development of Mellanox/NVIDIA GPUDirect over InfiniBand-a new model for GPU to GPU communications. Comput Sci Res Dev 26(3–4):267–273CrossRefGoogle Scholar
  35. 35.
    Wang H, Potluri S, Luo M et al (2011) MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters. Comput Sci Res Dev 26(3–4):257–266CrossRefGoogle Scholar
  36. 36.
    Ranjan R, Misra R (2014) Epidemic disease propagation detection algorithm using MapReduce for realistic social contact networks. In: 2014 International Conference High Performance Computing and Applications (ICHPCA), pp 1–6Google Scholar
  37. 37.
    Herrero-Lopez S (2011) Accelerating SVMs by integrating GPUs into MapReduce clusters. In: 2011 IEEE International Conference on System, Man, and Cybernetics (SMC), pp 1298–1305Google Scholar
  38. 38.
    Thapliyal H, Arabnia HR, Vinod AP (2006) Combined Integer and Floating Point Multiplication Architecture (CIFM) for FPGAs and Its Reversible Logic Implementation. In: 49th IEEE International Midwest Symposium on Circuits and Systems (MWSCAS’06), San Juan, Puerto Rico, August 6–9, pp 148–154Google Scholar
  39. 39.
    Zhou C, Nouretdinov l, Luo Z et al (2014) SVM venn machine with k-means clustering. Artif Intell Innov 437:251–260Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.OECE, University of Shanghai for science and technologyShanghaiChina

Personalised recommendations