Abstract
This paper described the nascent filed of big health data classification and disease probability prediction based on multi-GPU cluster MapReduce platform. Firstly, we presented a novel optimization-based multi-GPU cluster MapReduce system (gcMR) which is general purpose and suitable for processing big health data. Secondly, we proposed a new method IVP-SVM to solve the problem of big health data classification and disease probabilistic predictive inaccuracy. To illustrate the power and flexibility of gcMR platform for big health data, applications of a broad class of health big data using IVP-SVM on gcMR platform are described. Experimental results shown that gcMR platform yields an average computing efficiency on different health applications ranging from 1.8- to 13.5-folds by comparing gcMR with other Multi-GPU MapReduce platform. And an accuracy of the proposed IVP-SVM on different health applications is ranging from 85 to 100 %. This provides a motivation for pursuing the use of gcMR and IVP-SVM as a big health data analytical platform and tool, respectively.
Similar content being viewed by others
References
Raghupathi W, Raghupathi V (2014) Big data analytics in healthcare: promise and potential. Health Inf Sci Syst 2(1):1–10
Herland M, Khoshgoftaar TM, Rabdall W (2014) A review of data mining using big data in health informatics. J Big Data 1:2
Chauhan R, Kaur H (2015) A spectrum of big data applications for data analytics. Comput Intell Big Data Anal 19:165–179
Silverstein JC, Foster lT (2014) Computer architectures for health care and biomedicine. Biomed Inform 149–184
Mohammed EA, Far BH, Naugler C (2014) Applications of the MapReduce programming framework to clinical big data analysis: current landscape and future trends. Biodata Min 7(1):1
Fang W, He B, Luo Q et al (2011) Mars: accelerating mapreduce with graphics processors. IEEE Trans Parallel Distrib Syst 22(4):608–620
Guo Y, Liu W, Gong B et al (2013) GCMR: a GPU Cluster-Based MapReduce Framework for Large-Scale Data Processing. In: 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC_EUC), pp 580–586
Xie M, Kang K-D, Basaran C (2013) Moim: a Multi-GPU MapReduce Framework. In: 2013 IEEE 16th International Conference on Computational Science and Engineering (CSE), pp 1279–1286
Gao H, Tang J, Wu G (2013) A MapReduce Computing Framework based on GPU Cluster. In: 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC-EUC), pp 1902–1907
Sharma N, Om H (2015) Significant patterns extraction to find most effective treatment for oral cancer using data mining. Syst Think Approach Soc Probl 327:385–396
Rane AL (2015) Clinical decision support model for prevailing diseases to improve human life survivability. In: 2015 International Conference on Pervasive Computing (ICPC), pp. 1–5
Lee EK, Tsung-Lin W (2009) Classification and disease prediction via mathematical programming. Handb Optim Med 26:1–50
Eirola E, Gritsenko A, Akusok A et al (2015) Extreme learning machines for multiclass classification: refining predictions with gaussian mixture models. Adv Comput Intell 9095:153–164
Lambrou A, Nouretdinov l, Papadopoulos H (2015) Inductive venn prediction. Ann Math Artif Intell 74(1–2):181–201
Tauhidul lslam AKM, Jeong B-S, Golam Bari ATM et al (2015) MapReduce based parallel gene selection method. Appl Intell 42(2):147–156
Jiang H, Chen Y, Qiao Zhi et al (2015) Scaling up mapreduce-based big data processing on multi-GPU systems. Clust Comput 18(1):369–383
Vega J, Dormido-Canto S, Martinez F et al (2015) Computationally efficient five-class image classifier based on venn predictors. Stat Learn Data Sci 366–375
Lambrou A, Papadopoulos H, Gammeramn A (2012) Calibrated probabilistic predictions for biomedical applications. In: 2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE), 9047:211–216
Menor M, Baek K, Poisson G (2013) Multiclass relevance units machine: benchmark evaluation and application to small ncRNA discovery. BMC Genom 14(2):1
Lambrou A, Papadopoulos H, Gammerman A (2013) Osteoporosis risk assessment with well-calibrated probabilistic outputs. Artif Intell Appl Innov 412:423–441
Nouretdinov I, Lebedev A (2013) Defensive forecast for conformal bounded ression. Artif Intell Appl Innov 412:384–393
Adamskiy D, Nouretdinov I, Mitchell A et al (2011) Applying conformal prediction to the bovine TB diagnosing. Artif Intell Appl Innov 364:449–454
Nouretdinov I, Devetyarov D, Vovk V et al (2015) Multiprobabilistic prediction in early medical diagnoses. Ann Math Artif Intell 74(1–2):203–222
Devetyarov D, Nouretdinov I, Burford B et al (2012) Conformal predictors in early diagnostics of ovarian and breast cancers. Prog Artif Intell 1(3):245–257
Thapliyal H, Arabnia HR (2006) Reversible Programmable Logic Array (RPLA) Using Fredkin and Feynman Gates for Industrial Electronics and Applications. In: Proceedings of the 2006 International Conference on Computer Design & Conference on Computing in Nanotechnology (CDES’06: June 26-29, 2016; Las Vegas, USA), pp 70–74
Thapliyal H, Arabnia HR, Srinivas MB (2009) Efficient reversible logic design of BCD subtractors, vol III, LNCS 5300. Transactions on Computational Science Journal. Springer, pp 99–121
Balasubramanian P, Edwards DA, Arabnia HR (2011) Robust asynchronous carry lookahead adders. In: Proceedings of the International Conference on Computer Design (CDES’11: July 2011, USA) pp 119–124
Balasubramanian P, Arisaka R, Arabnia HR (2012) RB_DSOP: a rule based disjoint sum of products synthesis method. In: Proceedings of the 2012 International Conference on Computer Design (CDES’12: July, Las Vegas, USA) pp 39–43
Thapliyal H, Jayashree HV, Nagamani AN, Arabnia HR (2013) Progress in Reversible Processor Design: A Novel Methodology for Reversible Carry Look-ahead Adder. In: Gavrilova ML, Tan CJK (eds) Transactions in Computational Science, XVII, LNCS 7420. Springer, Berlin Heidelberg, pp 73–97
Verner U, Mendelson A, Schuster A (2014) Batch method for efficient resource sharing in real-time multi-GPU sytems. Distrib Comput Netw 8314:347–362
Navarro A, Vilches A, Corbera F et al (2014) Strategies for maximizing utilization on multi-CPU and multi-GPU heterogeneous architectures. J Supercomput 70(2):756–771
Li H, Yu D, Kumar A (2014) Performance modeling in CUDA streams-A means for high-throughput data processing. IEEE Int Conf Big Data 2014:301–310
Sourouri M, Gillberg T, Baden SB et al (2014) Effective multi-GPU communication using multiple CUDA streams and threads. In: 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS), pp 981–986
Shainer G, Ayoub A, Lui P et al (2011) The development of Mellanox/NVIDIA GPUDirect over InfiniBand-a new model for GPU to GPU communications. Comput Sci Res Dev 26(3–4):267–273
Wang H, Potluri S, Luo M et al (2011) MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters. Comput Sci Res Dev 26(3–4):257–266
Ranjan R, Misra R (2014) Epidemic disease propagation detection algorithm using MapReduce for realistic social contact networks. In: 2014 International Conference High Performance Computing and Applications (ICHPCA), pp 1–6
Herrero-Lopez S (2011) Accelerating SVMs by integrating GPUs into MapReduce clusters. In: 2011 IEEE International Conference on System, Man, and Cybernetics (SMC), pp 1298–1305
Thapliyal H, Arabnia HR, Vinod AP (2006) Combined Integer and Floating Point Multiplication Architecture (CIFM) for FPGAs and Its Reversible Logic Implementation. In: 49th IEEE International Midwest Symposium on Circuits and Systems (MWSCAS’06), San Juan, Puerto Rico, August 6–9, pp 148–154
Zhou C, Nouretdinov l, Luo Z et al (2014) SVM venn machine with k-means clustering. Artif Intell Innov 437:251–260
Acknowledgments
This work was funded by the National Natural Science Foundation of China (61572325), National Natural Science of China (60970012), Doctoral Program of Higher Specialized Research Fund Ph.D. (20113120110008), Shanghai Key Scientific and Technological Project (14511107902,16DZ1203603), Shanghai Engineering Center Construction Project (GCZX14014), Shanghai Smart Home massive Things Generic Technology Engineering Center Project (GCZX14014), Shanghai-class Discipline Construction Project (XTKX2012), and HJ Special Fund Research Base(C14001).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, J., Chen, Q. & Liu, B. Classification and disease probability prediction via machine learning programming based on multi-GPU cluster MapReduce system. J Supercomput 73, 1782–1809 (2017). https://doi.org/10.1007/s11227-016-1883-8
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1883-8