Random forest for big data classification in the internet of things using optimal features


The internet of things (IoT) is an internet among things through advanced communication without human’s operation. The effective use of data classification in IoT to find new and hidden truth can enhance the medical field. In this paper, the big data analytics on IoT based healthcare system is developed using the Random Forest Classifier (RFC) and MapReduce process. The e-health data are collected from the patients who suffered from different diseases is considered for analysis. The optimal attributes are chosen by using Improved Dragonfly Algorithm (IDA) from the database for the better classification. Finally, RFC classifier is used to classify the e-health data with the help of optimal features. It is observed from the implementation results is that the maximum precision of the proposed technique is 94.2%. In order to verify the effectiveness of the proposed method, the different performance measures are analyzed and compared with existing methods.

This is a preview of subscription content, access via your institution.

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


\(S{p_i}\) :

Separation of \(i\)th individual

\(P\) :

Current position

\({P_k}\) :

Position of \(k\)th individual

\(N\) :

Total number of neighboring individual in the search space

\({A_{li}}\) :

Alignment of \(i\)th neighboring individual

\({V_k}\) :

Velocity of \(k\)th individual

\({P^ - }\) :

Position of enemy

\({P^+}\) :

Position of food source

\(sw\) :

Separation weight

\(aw\) :

Alignment weight

\(cw\) :

Cohesion weight

\(Att\) :

Attraction, food factor

\(Dis\) :

Distraction, enemy factor

\(w\_CR\) :

Inertia weight-crossover rate

\(t\) :

Iteration count

\({f_{\text{max} }}\) :

Largest fitness value

\({f_p}\) :

Larger of the two individuals to cross the fitness

\({f_{avg}}\) :

Average fitness

\({f_{}}\) :

Mutation individual’s fitness

\({R_1},{R_2}\) :

Random values

\(V1,V2\) :

Random vectors that indicate the probability

\(F\) :

Margin function

\(I(\,)\) :

Indicator function

\({\arg _k}I({h_k}(V1)\) :

\({h_k}\) is \(n\)th tree of the RF


  1. 1.

    Bin S, Yuan L, Xiaoyi W (2010) Research on data mining models for the internet of things. In: Image analysis and signal processing (IASP), 2010 international conference on, IEEE, pp 127–132

  2. 2.

    Paul A, Daniel A, Ahmad A, Rho S (2017) Cooperative cognitive intelligence for the internet of vehicles. IEEE Syst J 11(3):1249–1258

    Article  Google Scholar 

  3. 3.

    Singh A, Sharma S, 2017, February. Analysis of data mining models for internet of things. In: I-SMAC (IoT in social, mobile, analytics, and cloud) (I-SMAC), 2017 international conference on, IEEE, pp 94–100

  4. 4.

    Yan Z, Liu J, Yang LT, Chawla N (2017) Big data fusion in internet of things. Inf Fusion. https://doi.org/10.1016/j.inffus.2017.04.005

    Article  Google Scholar 

  5. 5.

    Paul A (2013) Graph-based M2M optimization in an IoT environment. In: Proceedings of the 2013 research in adaptive and convergent systems, ACM, pp 45–46

  6. 6.

    Warner JL, Zhang P, Liu J, Alterovitz G (2016) Classification of hospital-acquired complications using temporal clinical information from a large electronic health record. J Biomed Inform 59:209–217

    Article  Google Scholar 

  7. 7.

    Ahmed E, Yaqoob I, Hashem IAT, Khan I, Ahmed AIA, Imran M, Vasilakos AV (2017) The role of big data analytics in the Internet of Things. Comput Netw 129:459–471

    Article  Google Scholar 

  8. 8.

    Plageras AP, Stergiou C, Kokkonis G, Psannis KE, Ishibashi Y, Kim BG, Gupta BB (2017) Efficient large-scale medical data (eHealth Big Data) analytics in the internet of things. In: Business informatics (CBI), 2017 IEEE 19th conference on, IEEE, vol 2, pp 21–27

  9. 9.

    Sugiyarti E, Jasmi KA, Basiron B, Huda M, Shankar K, Maseleno A (2018) Decision support system of scholarship grantee selection using data mining. Int J Pure Appl Math 119(15):2239–2249

    Google Scholar 

  10. 10.

    Susto GA, Schirru A, Pampuri S, McLoone S (2016) Supervised aggregative feature extraction for big data time series regression. IEEE Trans Ind Inform 12(3):1243–1252

    Article  Google Scholar 

  11. 11.

    Masetic Z, Subasi A (2016) Congestive heart failure detection using a random forest classifier. Comput Methods Prog Biomed 130:54–64

    Article  Google Scholar 

  12. 12.

    Revathi L, Appandiraj A (2017) Hadoop based parallel framework for feature subset selection in big data. J Innov Res Sci Eng Technol 4(5):3530–3534

    Google Scholar 

  13. 13.

    Shankar K (2017) Prediction of most risk factors in hepatitis disease using Apriori algorithm. Res J Pharm Biol Chem Sci 8(5):477–484. ISSN 0975-8585

    Google Scholar 

  14. 14.

    Mohapatra C, Rautray SS, Pandey M (2017) Prevention of infectious disease based on big data analytics and map-reduce. In: Electrical, computer and communication technologies (ICECCT), 2017 second international conference on, IEEE, pp 1–4

  15. 15.

    Lakshmanaprabu SK, Shankar K, Khanna A, Gupta D, Rodrigues JJ, Pinheiro PR, De Albuquerque VHC (2018) Effective features to classify big data using social internet of things. IEEE Access 6:24196–24204

    Article  Google Scholar 

  16. 16.

    Shankar K, Lakshmanaprabu SK, Gupta D et al (2018) Optimal feature-based multi-kernel SVM approach for thyroid disease classification. J Super Comput. https://doi.org/10.1007/s11227-018-2469-4

    Article  Google Scholar 

  17. 17.

    Manogaran G, Lopez D, Chilamkurti N (2018) In-Mapper combiner based MapReduce algorithm for processing of big climate data. Future Gener Comput Syst 86:433–445

    Article  Google Scholar 

  18. 18.

    Ke Q, Zhang J, Song H, Wan Y (2018) Big data analytics enabled by feature extraction based on partial independence. Neurocomputing 288:3–10

    Article  Google Scholar 

  19. 19.

    Sindhujaa N, Vanitha CN, Subaira AS (2016) An improved version of big data classification and clustering using graph search technique. Int J Comput Sci Mob Comput 5(2):224–229

    Google Scholar 

  20. 20.

    Wang F, Niu L (2016) An improved BP neural network in the internet of things data classification application research. In: Information technology, networking, electronic, and automation control conference, IEEE, pp 805–808

  21. 21.

    Paul A, Ahmad A, Rathore MM, Jabbar S (2016) Smartbuddy: defining human behaviors using big data analytics in the social internet of things. IEEE Wirel Commun 23(5):68–74

    Article  Google Scholar 

  22. 22.

    Ravichandran K, Nagarasan S (2016) Performance of classification in medical data mining. J Innov Res Comput Commun Eng 4(6):12104–12110

    Google Scholar 

  23. 23.

    Paul A, Rho S (2016) A probabilistic model for M2M in IoT networking and communication. Telecommun Syst 62(1):59–66

    Article  Google Scholar 

  24. 24.

    Sisiaridis D, Markowitch O (2017) Feature extraction and feature selection: reducing data complexity with apache spark. Int J Netw Secur Appl 9(6):39–51

    Google Scholar 

  25. 25.

    Antunes M, Gomes D, Aguiar RL (2018) Towards IoT data classification through semantic features. Future Gener Comput Syst 86:792–798

    Article  Google Scholar 

  26. 26.

    Shadroo S, Rahmani AM (2018) Systematic survey of big data and data mining in the internet of things. Comput Netw 139:19–47

    Article  Google Scholar 

  27. 27.

    Amroun H, Temkit MHH, Ammi M (2017) Best feature for CNN classification of human activity using IOT network. In: The internet of things (iThings) and IEEE green computing and communications (GreenCom) and IEEE cyber, physical and social computing (CPSCom) and IEEE smart data (SmartData), 2017 IEEE international conference on, IEEE, pp 943–950

  28. 28.

    Girish KV, Ramakrishnan AG, Kumar N (2018( A system for distributed audio classification using sparse representation over cloud for IOT. In: Communication systems & networks (COMSNETS), 2018 10th international conference on, IEEE, pp 342–347

  29. 29.

    Paul A (2014) Real-time power management for embedded M2M using intelligent learning methods. ACM Trans Embed Comput Syst (TECS) 13(5s):148

    Google Scholar 

  30. 30.

    Sree Ranjini KS, Murugan S (2017) Memory-based hybrid dragonfly algorithm for numerical optimization problems. Expert Syst Appl 83:63–78

    Article  Google Scholar 

  31. 31.

    Chaudhary A, Kolhe S, Kamal R (2016) An improved random forest classifier for multi-class classification. Inf Process Agric 3(4):215–222

    Google Scholar 

  32. 32.

    Subramaniyaswamy V, Vijayakumar V, Logesh R, Indragandhi V (2015) Unstructured data analysis on big data using map reduce. Procedia Comput Sci 50:456–465

    Article  Google Scholar 

  33. 33.

    Yang S, Guo JZ, Jin JW (2018) An improved Id3 algorithm for medical data classification. Comput Electr Eng 65:474–487

    Article  Google Scholar 

  34. 34.

    Tran CT, Zhang M, Andreae P, Xue B, Bui LT (2018) An effective and efficient approach to classification with incomplete data. Knowl Based Syst 154:1–16

    Article  Google Scholar 

  35. 35.

    Talari S, Shafie-khah M, Siano P, Loia V, Tommasetti A, Catalão JP (2017) A review of smart cities based on the internet of things concept. Energies 10(4):421

    Article  Google Scholar 

  36. 36.

    Ayma VA, Ferreira RS, Happ P, Oliveira D, Feitosa R, Costa G, Plaza A, Gamba P (2015) Classification algorithms for big data analysis, a map reduce approach. Int Arch Photogramm Remote Sens Spat Inf Sci 40(3):17

    Article  Google Scholar 

  37. 37.

    Harris NL, Jaffe ES, Stein H, Banks PM, Chan JK, Cleary ML, Delsol G, De Wolf-Peeters C, Falini B, Gatter KC, Grogan TM (1994) A revised European–American classification of lymphoid neoplasms: a proposal from the International Lymphoma Study Group. Blood 84(5):1361–1392

    Google Scholar 

  38. 38.

    https://archive.ics.uci.edu/ml/datasets/heart+Disease. Accessed 10 May 2018

  39. 39.

    https://archive.ics.uci.edu/ml/datasets/liver+disorders. Accessed 4 May 2018

  40. 40.

    https://archive.ics.uci.edu/ml/datasets/chronic_kidney_disease. Accessed 6 May 2018

  41. 41.

    http://archive.ics.uci.edu/ml/datasets/Lung+Cancer. Accessed 7 May 2018

Download references

Author information



Corresponding author

Correspondence to Naveen Chilamkurti.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Lakshmanaprabu, S.K., Shankar, K., Ilayaraja, M. et al. Random forest for big data classification in the internet of things using optimal features. Int. J. Mach. Learn. & Cyber. 10, 2609–2618 (2019). https://doi.org/10.1007/s13042-018-00916-z

Download citation


  • Internet of things
  • Big data
  • E-health
  • Map reduce
  • Random forest classifier
  • Dragonfly algorithm
  • Optimization