Skip to main content
Log in

Gramian matrix data collection-based random forest classification for predictive analytics with big data

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Prediction is the process of analyzing the current and past events to identify future events. The prediction of the subsequent future conditions is still a revealing stage in many applications to minimize the risk level. Several techniques have been developed for predictive analysis with big data. However, an accurate prediction analysis was not obtained while handling a large volume of data with less complexity. In order to improve prediction accuracy with less complexity, a Gramian symmetric data collection-based random forest bivariate regression and classification (GSDC-RFBRC) technique is developed. Initially, a large volume of data is collected from the dataset. Then, the Gramian symmetric matrix is used for storing the volume of data in rows and columns of a matrix. Then, the classification and regression process is carried out using random decision forests for finding future outcomes. Regression process measures the relationship between a dependent variable (i.e., outcomes) and independent variables (i.e., data) through bivariate correlation. Random decision forest constructs a number of decision trees for classification based on the correlation. Finally, it combines a number of decision trees and applies the voting scheme. The majority vote of classification results is identified for achieving high prediction accuracy. Experimental evaluation is carried out on the factors such as prediction accuracy, prediction time, false-positive rate and space complexity with respect to the number of data (i.e., file). The results confirmed that the proposed GSDC-RFBRC technique improves the performance results of prediction accuracy and minimizes the prediction time, false-positive rate as well as space complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Andreu-Perez J, Poon CCY, Merrifield RD, Wong STC, Yang GZ (2015) Big data for health. IEEE J Biomed Health Inform 19(4):1193–1208

    Article  Google Scholar 

  • Anooj PK (2012) Clinical decision support system: risk level prediction of heart disease using weighted fuzzy rules. J King Saud Univ Comput Inf Sci 24:27–40

    Google Scholar 

  • Badaoui F, Amar A, Hassou LA, Zoglat A, Okou CG (2017) Dimensionality reduction and class prediction algorithm with application to microarray Big Data. J Big Data 4(32):1–11

    Google Scholar 

  • Chadha R, Mayank S (2016) Prediction of heart disease using data mining techniques. CSI Trans ICT 4(2–4):193–198

    Article  Google Scholar 

  • Chen M, Hao Y, Hwang K, Wang L, Wang L (2017) Disease prediction by machine learning over big data from healthcare communities. IEEE Access 5:8869–8879

    Article  Google Scholar 

  • Cyganek B (2015) Hybrid ensemble of classifiers for logo and trademark symbols recognition. Soft Comput 19(12):3413–3430

    Article  Google Scholar 

  • Diabetes 130-US hospital for years 1999-2008 dataset. https://archive.ics.uci.edu/ml/datasets/diabetes+130-us+hospitals+for+years+1999-2008

  • Elkano M, Galar M, Sanz J, Bustince H (2018) CHI-PG: a fast prototype generation algorithm for Big Data classification problems. Neuro Comput 287:22–33

    Google Scholar 

  • Gosztolya G, Busa-Fekete R (2019) Calibrating AdaBoost for phoneme classification. Soft Comput 23(1):115–128

    Article  MATH  Google Scholar 

  • Heart diseases dataset. http://archive.ics.uci.edu/ml/datasets/heart+Disease

  • Hossain MS, Mohammad G (2016) Healthcare big data voice pathology assessment framework. IEEE Access 4:7806–7815

    Article  Google Scholar 

  • Hosseini M-P, Pompili D, Elisevich K, Soltanian-Zadeh H (2017) Optimized deep learning for eeg big data and seizure prediction BCI via the internet of things. IEEE Trans Big Data 3(4):392–404

    Article  Google Scholar 

  • Jabbar MA, Deekshatulu BL, Chandra P (2013a) Classification of heart disease using artificial neural network and feature subset selection. Glob J Comput Sci Technol Neural Artif Intell 13(3):1–14

    Google Scholar 

  • Jabbar MA, Deekshatulua BL, Chandra P (2013b) Classification of heart disease using k-nearest neighbor and genetic algorithm. Procedia Technol 10:85–94

    Article  Google Scholar 

  • Jindal A, Dua A, Kumar N, Das AK, Vasilakos AV, Rodrigues JJPC (2018) Providing healthcare-as-a-service using fuzzy rule-based big data analytics in cloud computing. IEEE J Biomed Health Inform 99:1–14

    Google Scholar 

  • Kim JK, Kang S (2017) Neural network-based coronary heart disease risk prediction using feature correlation analysis. J Healthc Eng 2017:1–13

    Google Scholar 

  • Li X, Xu X (2018) Optimization and decision-making with big data. Soft Comput 22(16):5197–5199

    Article  MATH  Google Scholar 

  • Li R, Liu W, Lin Y, Zhao H, Zhang C (2017) An ensemble multilabel classification for disease risk prediction. J Healthc Eng 2017:1–10

    Google Scholar 

  • Mahmud S, Iqbal R, Doctor F (2016) Cloud-enabled data analytics and visualization framework for health-shocks prediction. Future Gener Comput Syst 65:169–181

    Article  Google Scholar 

  • Nair LR, Shetty SD, Shetty SD (2018) Applying spark based machine learning model on streaming big data for health status prediction. Comput Electr Eng 65:393–399

    Article  Google Scholar 

  • Sahoo PK, Mohapatra SK, Wu S-L (2016) Analyzing healthcare big data with prediction for future health condition. IEEE Access 4:9786–9799

    Article  Google Scholar 

  • The Breast Cancer Wisconsin (Diagnostic) Dataset. https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)

  • Ullah F, Habib MA, Farhan M, Khalid S, Durrani MY, Jabbar S (2017) Semantic interoperability for big-data in heterogeneous IoT infrastructure for healthcare. Sustain Cities Soc 34:90–99

    Article  Google Scholar 

  • Wang J, Wu H, Wang R (2017) A new reliability model in replication-based big data storage systems. J Parallel Distrib Comput 108:14–27

    Article  Google Scholar 

  • Wenga C-H, Huang TC-K, Han R-P (2016) Disease prediction with different types of neural network classifiers. Telemat Inform 33:277–292

    Article  Google Scholar 

  • Zhang Y, Yang M, Zheng D, Lang P, Axin W, Chen C (2018) Efficient and secure big data storage system with leakage resilience in cloud computing. Soft Comput 22(23):7763–7772

    Article  MATH  Google Scholar 

  • Zhong H, Xiao J (2017) Enhancing health risk prediction with deep learning on big data and revised fusion node paradigm. Sci Program 2017:1–18

    Google Scholar 

Download references

Acknowledgements

The first author is thankful to the management of Kalasalingam Academy of Research and Education for providing fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Arun Kumar.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by Sahul Smys.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Arun Kumar, S., Venkatesulu, M. Gramian matrix data collection-based random forest classification for predictive analytics with big data. Soft Comput 23, 8621–8631 (2019). https://doi.org/10.1007/s00500-019-04014-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-019-04014-2

Keywords

Navigation