Abstract
Prediction is the process of analyzing the current and past events to identify future events. The prediction of the subsequent future conditions is still a revealing stage in many applications to minimize the risk level. Several techniques have been developed for predictive analysis with big data. However, an accurate prediction analysis was not obtained while handling a large volume of data with less complexity. In order to improve prediction accuracy with less complexity, a Gramian symmetric data collection-based random forest bivariate regression and classification (GSDC-RFBRC) technique is developed. Initially, a large volume of data is collected from the dataset. Then, the Gramian symmetric matrix is used for storing the volume of data in rows and columns of a matrix. Then, the classification and regression process is carried out using random decision forests for finding future outcomes. Regression process measures the relationship between a dependent variable (i.e., outcomes) and independent variables (i.e., data) through bivariate correlation. Random decision forest constructs a number of decision trees for classification based on the correlation. Finally, it combines a number of decision trees and applies the voting scheme. The majority vote of classification results is identified for achieving high prediction accuracy. Experimental evaluation is carried out on the factors such as prediction accuracy, prediction time, false-positive rate and space complexity with respect to the number of data (i.e., file). The results confirmed that the proposed GSDC-RFBRC technique improves the performance results of prediction accuracy and minimizes the prediction time, false-positive rate as well as space complexity.
Similar content being viewed by others
References
Andreu-Perez J, Poon CCY, Merrifield RD, Wong STC, Yang GZ (2015) Big data for health. IEEE J Biomed Health Inform 19(4):1193–1208
Anooj PK (2012) Clinical decision support system: risk level prediction of heart disease using weighted fuzzy rules. J King Saud Univ Comput Inf Sci 24:27–40
Badaoui F, Amar A, Hassou LA, Zoglat A, Okou CG (2017) Dimensionality reduction and class prediction algorithm with application to microarray Big Data. J Big Data 4(32):1–11
Chadha R, Mayank S (2016) Prediction of heart disease using data mining techniques. CSI Trans ICT 4(2–4):193–198
Chen M, Hao Y, Hwang K, Wang L, Wang L (2017) Disease prediction by machine learning over big data from healthcare communities. IEEE Access 5:8869–8879
Cyganek B (2015) Hybrid ensemble of classifiers for logo and trademark symbols recognition. Soft Comput 19(12):3413–3430
Diabetes 130-US hospital for years 1999-2008 dataset. https://archive.ics.uci.edu/ml/datasets/diabetes+130-us+hospitals+for+years+1999-2008
Elkano M, Galar M, Sanz J, Bustince H (2018) CHI-PG: a fast prototype generation algorithm for Big Data classification problems. Neuro Comput 287:22–33
Gosztolya G, Busa-Fekete R (2019) Calibrating AdaBoost for phoneme classification. Soft Comput 23(1):115–128
Heart diseases dataset. http://archive.ics.uci.edu/ml/datasets/heart+Disease
Hossain MS, Mohammad G (2016) Healthcare big data voice pathology assessment framework. IEEE Access 4:7806–7815
Hosseini M-P, Pompili D, Elisevich K, Soltanian-Zadeh H (2017) Optimized deep learning for eeg big data and seizure prediction BCI via the internet of things. IEEE Trans Big Data 3(4):392–404
Jabbar MA, Deekshatulu BL, Chandra P (2013a) Classification of heart disease using artificial neural network and feature subset selection. Glob J Comput Sci Technol Neural Artif Intell 13(3):1–14
Jabbar MA, Deekshatulua BL, Chandra P (2013b) Classification of heart disease using k-nearest neighbor and genetic algorithm. Procedia Technol 10:85–94
Jindal A, Dua A, Kumar N, Das AK, Vasilakos AV, Rodrigues JJPC (2018) Providing healthcare-as-a-service using fuzzy rule-based big data analytics in cloud computing. IEEE J Biomed Health Inform 99:1–14
Kim JK, Kang S (2017) Neural network-based coronary heart disease risk prediction using feature correlation analysis. J Healthc Eng 2017:1–13
Li X, Xu X (2018) Optimization and decision-making with big data. Soft Comput 22(16):5197–5199
Li R, Liu W, Lin Y, Zhao H, Zhang C (2017) An ensemble multilabel classification for disease risk prediction. J Healthc Eng 2017:1–10
Mahmud S, Iqbal R, Doctor F (2016) Cloud-enabled data analytics and visualization framework for health-shocks prediction. Future Gener Comput Syst 65:169–181
Nair LR, Shetty SD, Shetty SD (2018) Applying spark based machine learning model on streaming big data for health status prediction. Comput Electr Eng 65:393–399
Sahoo PK, Mohapatra SK, Wu S-L (2016) Analyzing healthcare big data with prediction for future health condition. IEEE Access 4:9786–9799
The Breast Cancer Wisconsin (Diagnostic) Dataset. https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
Ullah F, Habib MA, Farhan M, Khalid S, Durrani MY, Jabbar S (2017) Semantic interoperability for big-data in heterogeneous IoT infrastructure for healthcare. Sustain Cities Soc 34:90–99
Wang J, Wu H, Wang R (2017) A new reliability model in replication-based big data storage systems. J Parallel Distrib Comput 108:14–27
Wenga C-H, Huang TC-K, Han R-P (2016) Disease prediction with different types of neural network classifiers. Telemat Inform 33:277–292
Zhang Y, Yang M, Zheng D, Lang P, Axin W, Chen C (2018) Efficient and secure big data storage system with leakage resilience in cloud computing. Soft Comput 22(23):7763–7772
Zhong H, Xiao J (2017) Enhancing health risk prediction with deep learning on big data and revised fusion node paradigm. Sci Program 2017:1–18
Acknowledgements
The first author is thankful to the management of Kalasalingam Academy of Research and Education for providing fellowship.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Communicated by Sahul Smys.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Arun Kumar, S., Venkatesulu, M. Gramian matrix data collection-based random forest classification for predictive analytics with big data. Soft Comput 23, 8621–8631 (2019). https://doi.org/10.1007/s00500-019-04014-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-019-04014-2