Skip to main content
Log in

Cyber forensics framework for big data analytics in IoT environment using machine learning

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Forensic analyst skills are at stake for processing of growing data from IoT based environment platforms. Tangible sources often have the size limits, but that’s not the case for communication traffic source. Hence, increasing the thirst for an efficient benchmarking for big data analysis. Available solutions to date have used an anomaly-based approach or have proposed approaches based on the deviation from a regular pattern. To tackle the seized bytes, authors have proposed an approach for big data forensics, with efficient sensitivity and precision. In the presented work, a generalized forensic framework has been proposed that use Google’s programming model, MapReduce as the backbone for traffic translation, extraction, and analysis of dynamic traffic features. For the proposed technique, authors have used open source tools like Hadoop, Hive, and Mahout and R. Apart from being open source, these tools support scalability and parallel processing. Also, comparative analysis of globally accepted machine learning models of P2P malware analysis in mocked real-time is presented. Dataset from CAIDA was taken and executed in parallel to validate the proposed model. Finally, the forensic performance metrics of the model shows the results with the sensitivity of 99%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Al Fahdi M, Clarke NL, Furnell SM (2013) Challenges to digital forensics: a survey of researchers & practitioners attitudes and opinions. In Information Security for South Africa, 2013 (1-8). IEEE. doi:https://doi.org/10.1109/ISSA.2013.6641058

  2. Almulla S, Iraqi Y, Jones A (2013) A distributed snapshot framework for digital forensics evidence extraction and event reconstruction from cloud environment. Cloud Computing Technology and Science (CloudCom), 2013 IEEE 5th International Conference 1:699–704). IEEE. https://doi.org/10.1109/CloudCom.2013.114

    Article  Google Scholar 

  3. Apache Hive Documentation. The Apache Software Foundation, Available at: https://hive.apache.org/

  4. Apache Mahout Documentation. The Apache Software Foundation, Available at: https://mahout.apache.org/docs/0.13.0/api/docs/

  5. Babar S, Mahalle P, Stango A, Prasad N, Prasad R (2010) Proposed security model and threat taxonomy for the internet of things (IoT). International conference on network security and applications. Springer, Berlin, Heidelberg, pp 420–429. https://doi.org/10.1007/978-3-642-14478-3_42

    Book  MATH  Google Scholar 

  6. Benvenuti C (2006) Understanding Linux network internals. “O’Reilly Media, Inc

  7. Bradford A (2002) (Mobile book), The Handbook of Brain Theory and Neural Netw, second edition, MIT Press

  8. Brian Feeny Harvard Grad Student Blog Harvard (2017) Available at: https://www.quora.com/profile/Brian-Feeny

  9. Carroll OL, Brannon SK, Song T, Littlefield MJ, Newby T (2008) Computer forensics, 56(1), p.1, US Department of Justice, https://www.justice.gov/sites/default/files/usao/legacy/2008/02/04/usab5601.pdf

  10. Conti M, Dehghantanha A, Franke K, Watson S (2018) Internet of things security and forensics: challenges and opportunities, doi:https://doi.org/10.1016/j.future.2017.07.060

  11. Conti M, Dehghantanha A, Franke K, Watson S (2018) Internet of Things security and forensics. Chall Opportun 78(2):544–546. https://doi.org/10.1016/j.future.2017.07.060

    Google Scholar 

  12. Cook, Kristin, Georges Grinstein, Mark Whiting, Michael Cooper, Paul Havig, Kristen Liggett, Bohdan Nebesh, and Celeste Lyn Paul. (2012) VAST challenge 2012: visual analytics for big data. Visual Anal Sci Technol (VAST), 2012 IEEE Conf:251–255. doi:https://doi.org/10.1109/VAST.2012.6400529

  13. Cui B, He S (2016) Anomaly detection model based on hadoop platform and weka interface. Innov Mobile Internet Serv Ubiquitous Comput (IMIS), 2016 10th International Conference on. IEEE

  14. Europol (2016) Internet Organised Crime Threat Assessment (IOCTA) 2016, Available at: https://www.europol.europa.eu/sites/default/files/documents/europol_iocta_web_2016.pdf

  15. Google LLC (“Google”). Available at: https://trends.google.com/trends/, United States

  16. Grabosky P (2016) The 2 evolution of cybercrime. cybercrime through an interdisciplinary lens, vol 26. Routledge, Taylor & Francis, London and New York, p 15

    Google Scholar 

  17. Guarino A (2013) Digital forensics as a big data challenge. In ISSE 2013 securing electronic business processes (pp. 197-203). Springer Vieweg, Wiesbaden. doi:https://doi.org/10.1007/978-3-658-03371-2_17

  18. Help Net Security (2015) Top IoT concerns? Data volumes and network stress, Available at: https://www.helpnetsecurity.com/2015/12/09/top-iot-concerns-data-volumes-and-network-stress/

  19. Hsieh C-J, Ting-yuan Chan (2016) detection DDoS attacks based on neural-network using apache spark. Appl Syst Innov (ICASI), 2016 Int Conf IEEE

  20. Ingersoll G (2009) Introducing apache mahout. Scalable, commercial friendly machine learning for building intelligent applications. IBM. Available at: http://www.powerbox.pe.kr/attachment/cfile7.uf@204291174C2985FA1D2B6C.pdf

  21. Lavion D (2018) Pulling fraud out of the shadows, PwC’s 2018 Global Economic Crime and Fraud Survey Available at: https://www.pwc.com/gx/en/forensics/global-economic-crime-and-fraud-survey-2018.pdf

  22. Liao CF, Bao SW, Cheng CJ, Chen K (2017) On design issues and architectural styles for blockchain-driven IoT services. Consumer Electronics-Taiwan (ICCE-TW), 2017 IEEE Int Conf: 351–352). IEEE. doi:https://doi.org/10.1109/ICCE-China.2017.7991140

  23. Macdermott A, Baker T, Shi Q (2018) IoT Forensics: challenges for the IoA Era. New Technol Mobil Sec (NTMS), 2018 9th IFIP Int Conf: 1–5. IEEE. doi:https://doi.org/10.1109/NTMS.2018.8328748

  24. Mayhew M, Atighetchi M, Adler A, Greenstadt R (2015) Use of machine learning in big data analytics for insider threat detection. Military Commun Conf, MILCOM 2015-2015 IEEE: 915–922. IEEE. doi:https://doi.org/10.1109/MILCOM.2015.7357562

  25. Meidan Y et al. (2017) "ProfilIoT: a machine learning approach for IoT device identification based on network traffic analysis." Proceedings of the Symposium on Applied Computing. ACM

  26. Merino, B. (2013). Instant traffic analysis with Tshark how-to. Packt Publishing Ltd.

  27. Mylavarapu G, Thomas J, Ashwin Kumar TK (2015) Real-time hybrid intrusion detection system using apache storm. High Performance Computing and Communications (HPCC), 2015 IEEE 7th International Symposium on Cyberspace Safety and Security (CSS), 2015 IEEE 12th International Conferen on Embedded Software and Systems (ICESS), 2015 IEEE 17th International Conference on. IEEE

  28. Neshatpour K, Malik M, Ghodrat MA, Homayoun H (2015) Accelerating big data analytics using fpgas. Field-Program Custom Comput Mach (FCCM), 2015 IEEE 23rd Ann Int Sym: 164. IEEE. doi:https://doi.org/10.1109/FCCM.2015.59

  29. Ngu AH, Gutierrez M, Metsis V, Nepal S, Sheng QZ (2017) IoT middleware: a survey on issues and enabling technologies. IEEE Internet Things J 4(1):1–20. https://doi.org/10.1109/JIOT.2016.2615180

    Article  Google Scholar 

  30. Owen S (2012) Mahout in action, Available at: https://sisis.rz.htw-berlin.de/inh2011/12399459.pdf

  31. Pajouh HH, Javidan R, Khayami R, Ali D, Choo KKR (2016) A two-layer dimension reduction and two-tier classification model for anomaly-based intrusion detection in IoT backbone networks. IEEE Trans Emerging Topics Comput. https://doi.org/10.1109/TETC.2016.2633228

  32. Pakalra EG, Alma J, Rohm WA, Martens J, Rohrer B (2017) How to choose algorithms for Microsoft Azure Machine Learning, Microsoft Corporation. Available at: https://docs.microsoft.com/en-us/azure/machine-learning/studio/algorithm-choice

  33. Pan X, Tan J, Kavulya S, Gandhi R, Narasimhan P (2009). Blind Men and the Elephant: Piecing together Hadoop for diagnosis. Int Sym Softw Reliab Eng (ISSRE), Mysuru, India

  34. Perumal S, Norwawi NM, Raman V (2015) Internet of things (IoT) digital forensic investigation model: top-down forensic approach methodology. Digit Info Process Commun (ICDIPC), 2015 Fifth Int Conf: 19–23). IEEE, doi:https://doi.org/10.1109/ICDIPC.2015.7323000

  35. Puri C, Dukatz C (2015) Analyzing and predicting security event anomalies: lessons learned from a large Enterprise big data streaming analytics deployment. Database and Expert Systems Applications (DEXA), 2015 26th International Workshop on. IEEE

  36. Puthal, D., Ranjan, R., Nepal, S., & Chen, J. (2017). IoT and big data: an architecture with data flow and security issues. In Cloud Infrastructures, Services, and IoT Systems for Smart Cities (pp. 243-252). Springer, Cham. doi:https://doi.org/10.1007/978-3-319-67636-4_25

  37. Ramanathan R, Latha B (2018) Towards optimal resource provisioning for Hadoop-MapReduce jobs using scale-out strategy and its performance analysis in private cloud environment. Clust Comput: 1–11. doi:https://doi.org/10.1007/s10586-018-2234-8

  38. Rathore MM, Ahmad A, Paul A (2016) Real time intrusion detection system for ultra-high-speed big data environments. J Supercomput 72(9):3489–3510

    Article  Google Scholar 

  39. Razzaq A, Latif K, Ahmad HF, Hur A, Anwar Z, Bloodsworth PC (2014) Semantic security against web application attacks. Inf Sci 254:19–38. https://doi.org/10.1016/j.ins.2013.08.007

    Article  Google Scholar 

  40. Ripley BD, Murdoch DJ R for Windows FAQ, Available at: https://cran.r-project.org/bin/windows/base/rw-FAQ.html

  41. Salcedo-Campos F, Díaz-Verdejo J, García-Teodoro P (2012) Segmental parameterisation and statistical modelling of e-mail headers for spam detection. Inf Sci 195:45–61. https://doi.org/10.1016/j.ins.2012.01.022

    Article  Google Scholar 

  42. Sanchez-Artigas M, Herrera B (2013) Understanding the effects of P2P dynamics on trust bootstrapping. Inf Sci 236:33–55. https://doi.org/10.1016/j.ins.2013.02.034

    Article  Google Scholar 

  43. Schoof R, Koning R (2007) Detecting peer-to-peer botnets. University of Amsterdam

  44. Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In Mass storage systems and technologies (MSST), 2010 IEEE 26th symposium on (pp. 1-10). IEEE, doi:https://doi.org/10.1109/MSST.2010.5496972

  45. Singh K, Guntuku SC, Thakur A, Hota C (2014) Big data analytics framework for peer-to-peer botnet detection using random forests. Inf Sci 278:488–497 https://doi.org/10.1016/j.ins.2014.03.066

    Article  Google Scholar 

  46. Skopkó T (2012) Loss analysis of the software-based packet capturing. Carpathian J Elect Comput Eng 5:107

    Google Scholar 

  47. Slay J (2018) Towards developing network forensic mechanism for botnet activities in the IoT based on machine learning techniques. In Mobile networks and management: 9th international conference, MONAMI 2017, Melbourne, Australia, December 13-15, 2017, Proceedings (Vol. 235, p. 30). Springer

  48. Sqoop Documentation, The Apache Software Foundation, Available at: http://sqoop.apache.org/docs/1.4.7/index.html

  49. Srinivasan MK, Revathy P (2018) State-of-the-art big data security taxonomies. Proc 11th Innov Software Eng Conf: 16). ACM. doi:https://doi.org/10.1145/3172871.3172886

  50. Stergiou C, Psannis KE, Kim BG, Gupta B (2018) Secure integration of IoT and cloud computing. Futur Gener Comput Syst 78:964–975. https://doi.org/10.1016/j.future.2016.11.031

    Article  Google Scholar 

  51. Team R. Core (2000) R language definition. R foundation for statistical computing, Vienna, Austria Available at:http://web.mit.edu/~r/current/arch/amd64_linux26/lib/R/doc/manual/R-lang.pdf

    Google Scholar 

  52. Terzi DS, Terzi R, Sagiroglu S (2017) "Big data analytics for network anomaly detection from netflow data." Computer Science and Engineering (UBMK), 2017 International Conference on. IEEE

  53. Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R (2009) Hive: a warehousing solution over a map-reduce framework. Proc VLDB Endowment 2(2):1626–1629. https://doi.org/10.14778/1687553.1687609

    Article  Google Scholar 

  54. Uddin MF, Gupta N (2014) Seven V's of Big Data understanding Big Data to extract value. In American Society for Engineering Education (ASEE Zone 1), 2014 Zone 1 Conference of the (pp. 1-5). IEEE. doi:https://doi.org/10.1109/ASEEZone1.2014.6820689

  55. Uzun M, Abul O (2016) End-to-end internet speed analysis of mobile networks with mapReduce. Netw Comput Commun (ISNCC) 2016 Int Sym: 1–6. IEEE. doi:https://doi.org/10.1109/ISNCC.2016.7746114

  56. Verma S, Kawamoto Y, Fadlullah ZM, Nishiyama H, Kato N (2017) A survey on network methodologies for real-time analytics of massive IoT data and open research issues. IEEE Commun Surv Tutor 19(3):1457–1477. https://doi.org/10.1109/COMST.2017.2694469

    Article  Google Scholar 

  57. Wang C, Chi CH, Zhou W, Wong RK (2015) Coupled interdependent attribute analysis on mixed data. AAAI: 1861–1867

  58. Wang X et al (2018) D2D Big Data: content deliveries over wireless device-to-device sharing in large-scale mobile networks. IEEE Wirel Commun 25.1:32–38

    Article  Google Scholar 

  59. Yen TF, Reiter MK (2010) Are your hosts trading or plotting? Telling P2P file-sharing and bots apart. Distrib Comput Syst (ICDCS), 2010 IEEE 30th Int Conf: 241–252. IEEE. doi:https://doi.org/10.1109/ICDCS.2010.76

  60. Zheng X et al (2015) Detecting spammers on social networks. Neurocomputing 159:27–34

    Article  Google Scholar 

  61. Zhou X et al. (2014) Exploring Netfow data using hadoop. Proc Second ASE Int Conf Big Data Sci Computing

  62. Zuech R, Khoshgoftaar TM, Wald R (2015) Intrusion detection and big heterogeneous data: a survey. J Big Data 2(1):3. https://doi.org/10.1186/s40537-015-0013-4

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gurpal Singh Chhabra.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chhabra, G.S., Singh, V.P. & Singh, M. Cyber forensics framework for big data analytics in IoT environment using machine learning. Multimed Tools Appl 79, 15881–15900 (2020). https://doi.org/10.1007/s11042-018-6338-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6338-1

Keywords

Navigation