Gramian matrix data collection-based random forest classification for predictive analytics with big data

Arun Kumar, S.; Venkatesulu, M.

doi:10.1007/s00500-019-04014-2

Gramian matrix data collection-based random forest classification for predictive analytics with big data

Focus
Published: 26 April 2019

Volume 23, pages 8621–8631, (2019)
Cite this article

Soft Computing Aims and scope Submit manuscript

301 Accesses
2 Citations
Explore all metrics

Abstract

Prediction is the process of analyzing the current and past events to identify future events. The prediction of the subsequent future conditions is still a revealing stage in many applications to minimize the risk level. Several techniques have been developed for predictive analysis with big data. However, an accurate prediction analysis was not obtained while handling a large volume of data with less complexity. In order to improve prediction accuracy with less complexity, a Gramian symmetric data collection-based random forest bivariate regression and classification (GSDC-RFBRC) technique is developed. Initially, a large volume of data is collected from the dataset. Then, the Gramian symmetric matrix is used for storing the volume of data in rows and columns of a matrix. Then, the classification and regression process is carried out using random decision forests for finding future outcomes. Regression process measures the relationship between a dependent variable (i.e., outcomes) and independent variables (i.e., data) through bivariate correlation. Random decision forest constructs a number of decision trees for classification based on the correlation. Finally, it combines a number of decision trees and applies the voting scheme. The majority vote of classification results is identified for achieving high prediction accuracy. Experimental evaluation is carried out on the factors such as prediction accuracy, prediction time, false-positive rate and space complexity with respect to the number of data (i.e., file). The results confirmed that the proposed GSDC-RFBRC technique improves the performance results of prediction accuracy and minimizes the prediction time, false-positive rate as well as space complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

RRF-BD: Ranger Random Forest Algorithm for Big Data Classification

An Advanced Random Forest Algorithm Targeting the Big Data with Redundant Features

Forests of Randomized Shapelet Trees

References

Andreu-Perez J, Poon CCY, Merrifield RD, Wong STC, Yang GZ (2015) Big data for health. IEEE J Biomed Health Inform 19(4):1193–1208
Article Google Scholar
Anooj PK (2012) Clinical decision support system: risk level prediction of heart disease using weighted fuzzy rules. J King Saud Univ Comput Inf Sci 24:27–40
Google Scholar
Badaoui F, Amar A, Hassou LA, Zoglat A, Okou CG (2017) Dimensionality reduction and class prediction algorithm with application to microarray Big Data. J Big Data 4(32):1–11
Google Scholar
Chadha R, Mayank S (2016) Prediction of heart disease using data mining techniques. CSI Trans ICT 4(2–4):193–198
Article Google Scholar
Chen M, Hao Y, Hwang K, Wang L, Wang L (2017) Disease prediction by machine learning over big data from healthcare communities. IEEE Access 5:8869–8879
Article Google Scholar
Cyganek B (2015) Hybrid ensemble of classifiers for logo and trademark symbols recognition. Soft Comput 19(12):3413–3430
Article Google Scholar
Diabetes 130-US hospital for years 1999-2008 dataset. https://archive.ics.uci.edu/ml/datasets/diabetes+130-us+hospitals+for+years+1999-2008
Elkano M, Galar M, Sanz J, Bustince H (2018) CHI-PG: a fast prototype generation algorithm for Big Data classification problems. Neuro Comput 287:22–33
Google Scholar
Gosztolya G, Busa-Fekete R (2019) Calibrating AdaBoost for phoneme classification. Soft Comput 23(1):115–128
Article MATH Google Scholar
Heart diseases dataset. http://archive.ics.uci.edu/ml/datasets/heart+Disease
Hossain MS, Mohammad G (2016) Healthcare big data voice pathology assessment framework. IEEE Access 4:7806–7815
Article Google Scholar
Hosseini M-P, Pompili D, Elisevich K, Soltanian-Zadeh H (2017) Optimized deep learning for eeg big data and seizure prediction BCI via the internet of things. IEEE Trans Big Data 3(4):392–404
Article Google Scholar
Jabbar MA, Deekshatulu BL, Chandra P (2013a) Classification of heart disease using artificial neural network and feature subset selection. Glob J Comput Sci Technol Neural Artif Intell 13(3):1–14
Google Scholar
Jabbar MA, Deekshatulua BL, Chandra P (2013b) Classification of heart disease using k-nearest neighbor and genetic algorithm. Procedia Technol 10:85–94
Article Google Scholar
Jindal A, Dua A, Kumar N, Das AK, Vasilakos AV, Rodrigues JJPC (2018) Providing healthcare-as-a-service using fuzzy rule-based big data analytics in cloud computing. IEEE J Biomed Health Inform 99:1–14
Google Scholar
Kim JK, Kang S (2017) Neural network-based coronary heart disease risk prediction using feature correlation analysis. J Healthc Eng 2017:1–13
Google Scholar
Li X, Xu X (2018) Optimization and decision-making with big data. Soft Comput 22(16):5197–5199
Article MATH Google Scholar
Li R, Liu W, Lin Y, Zhao H, Zhang C (2017) An ensemble multilabel classification for disease risk prediction. J Healthc Eng 2017:1–10
Google Scholar
Mahmud S, Iqbal R, Doctor F (2016) Cloud-enabled data analytics and visualization framework for health-shocks prediction. Future Gener Comput Syst 65:169–181
Article Google Scholar
Nair LR, Shetty SD, Shetty SD (2018) Applying spark based machine learning model on streaming big data for health status prediction. Comput Electr Eng 65:393–399
Article Google Scholar
Sahoo PK, Mohapatra SK, Wu S-L (2016) Analyzing healthcare big data with prediction for future health condition. IEEE Access 4:9786–9799
Article Google Scholar
The Breast Cancer Wisconsin (Diagnostic) Dataset. https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
Ullah F, Habib MA, Farhan M, Khalid S, Durrani MY, Jabbar S (2017) Semantic interoperability for big-data in heterogeneous IoT infrastructure for healthcare. Sustain Cities Soc 34:90–99
Article Google Scholar
Wang J, Wu H, Wang R (2017) A new reliability model in replication-based big data storage systems. J Parallel Distrib Comput 108:14–27
Article Google Scholar
Wenga C-H, Huang TC-K, Han R-P (2016) Disease prediction with different types of neural network classifiers. Telemat Inform 33:277–292
Article Google Scholar
Zhang Y, Yang M, Zheng D, Lang P, Axin W, Chen C (2018) Efficient and secure big data storage system with leakage resilience in cloud computing. Soft Comput 22(23):7763–7772
Article MATH Google Scholar
Zhong H, Xiao J (2017) Enhancing health risk prediction with deep learning on big data and revised fusion node paradigm. Sci Program 2017:1–18
Google Scholar

Download references

Acknowledgements

The first author is thankful to the management of Kalasalingam Academy of Research and Education for providing fellowship.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Kalasalingam Academy of Research and Education, Krishnan Kovil, Srivilliputtur, Tamilnadu, India
S. Arun Kumar
Department of Information Technology, Kalasalingam Academy of Research and Education, Krishnan Kovil, Srivilliputtur, Tamilnadu, India
M. Venkatesulu

Authors

S. Arun Kumar
View author publications
You can also search for this author in PubMed Google Scholar
M. Venkatesulu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Arun Kumar.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Communicated by Sahul Smys.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arun Kumar, S., Venkatesulu, M. Gramian matrix data collection-based random forest classification for predictive analytics with big data. Soft Comput 23, 8621–8631 (2019). https://doi.org/10.1007/s00500-019-04014-2

Download citation

Published: 26 April 2019
Issue Date: 01 September 2019
DOI: https://doi.org/10.1007/s00500-019-04014-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gramian matrix data collection-based random forest classification for predictive analytics with big data

Abstract

Access this article

Similar content being viewed by others

RRF-BD: Ranger Random Forest Algorithm for Big Data Classification

An Advanced Random Forest Algorithm Targeting the Big Data with Redundant Features

Forests of Randomized Shapelet Trees

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Gramian matrix data collection-based random forest classification for predictive analytics with big data

Abstract

Access this article

Similar content being viewed by others

RRF-BD: Ranger Random Forest Algorithm for Big Data Classification

An Advanced Random Forest Algorithm Targeting the Big Data with Redundant Features

Forests of Randomized Shapelet Trees

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation